npm - @zigrivers/scaffold - Versions diffs - 3.11.0 → 3.13.0 - Mend

@zigrivers/scaffold 3.11.0 → 3.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

package/README.md +6 -5
package/content/knowledge/core/automated-review-tooling.md +137 -140
package/content/knowledge/core/multi-model-research-dispatch.md +27 -16
package/content/knowledge/core/multi-model-review-dispatch.md +47 -6
package/content/skills/multi-model-dispatch/SKILL.md +20 -22
package/content/tools/post-implementation-review.md +71 -26
package/content/tools/review-code.md +37 -11
package/content/tools/review-pr.md +65 -23
package/dist/cli/commands/build.test.js +1 -0
package/dist/cli/commands/build.test.js.map +1 -1
package/dist/cli/commands/init.d.ts.map +1 -1
package/dist/cli/commands/init.js +88 -77
package/dist/cli/commands/init.js.map +1 -1
package/dist/cli/commands/init.test.js +30 -0
package/dist/cli/commands/init.test.js.map +1 -1
package/dist/cli/commands/knowledge.test.js +1 -0
package/dist/cli/commands/knowledge.test.js.map +1 -1
package/dist/cli/commands/run.test.js +1 -0
package/dist/cli/commands/run.test.js.map +1 -1
package/dist/cli/commands/skip.test.js +1 -0
package/dist/cli/commands/skip.test.js.map +1 -1
package/dist/cli/output/auto.d.ts +19 -6
package/dist/cli/output/auto.d.ts.map +1 -1
package/dist/cli/output/auto.js +10 -6
package/dist/cli/output/auto.js.map +1 -1
package/dist/cli/output/context.d.ts +23 -5
package/dist/cli/output/context.d.ts.map +1 -1
package/dist/cli/output/context.js.map +1 -1
package/dist/cli/output/context.test.js +585 -0
package/dist/cli/output/context.test.js.map +1 -1
package/dist/cli/output/error-display.test.js +1 -0
package/dist/cli/output/error-display.test.js.map +1 -1
package/dist/cli/output/interactive.d.ts +19 -6
package/dist/cli/output/interactive.d.ts.map +1 -1
package/dist/cli/output/interactive.js +165 -59
package/dist/cli/output/interactive.js.map +1 -1
package/dist/cli/output/json.d.ts +19 -6
package/dist/cli/output/json.d.ts.map +1 -1
package/dist/cli/output/json.js +10 -6
package/dist/cli/output/json.js.map +1 -1
package/dist/core/assembly/overlay-state-resolver.test.js +1 -0
package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
package/dist/e2e/game-pipeline.test.js +1 -0
package/dist/e2e/game-pipeline.test.js.map +1 -1
package/dist/e2e/init.test.js +1 -0
package/dist/e2e/init.test.js.map +1 -1
package/dist/e2e/project-type-overlays.test.js +1 -0
package/dist/e2e/project-type-overlays.test.js.map +1 -1
package/dist/wizard/copy/backend.d.ts +3 -0
package/dist/wizard/copy/backend.d.ts.map +1 -0
package/dist/wizard/copy/backend.js +49 -0
package/dist/wizard/copy/backend.js.map +1 -0
package/dist/wizard/copy/browser-extension.d.ts +3 -0
package/dist/wizard/copy/browser-extension.d.ts.map +1 -0
package/dist/wizard/copy/browser-extension.js +35 -0
package/dist/wizard/copy/browser-extension.js.map +1 -0
package/dist/wizard/copy/cli.d.ts +3 -0
package/dist/wizard/copy/cli.d.ts.map +1 -0
package/dist/wizard/copy/cli.js +40 -0
package/dist/wizard/copy/cli.js.map +1 -0
package/dist/wizard/copy/core.d.ts +3 -0
package/dist/wizard/copy/core.d.ts.map +1 -0
package/dist/wizard/copy/core.js +52 -0
package/dist/wizard/copy/core.js.map +1 -0
package/dist/wizard/copy/data-pipeline.d.ts +3 -0
package/dist/wizard/copy/data-pipeline.d.ts.map +1 -0
package/dist/wizard/copy/data-pipeline.js +66 -0
package/dist/wizard/copy/data-pipeline.js.map +1 -0
package/dist/wizard/copy/game.d.ts +3 -0
package/dist/wizard/copy/game.d.ts.map +1 -0
package/dist/wizard/copy/game.js +115 -0
package/dist/wizard/copy/game.js.map +1 -0
package/dist/wizard/copy/index.d.ts +8 -0
package/dist/wizard/copy/index.d.ts.map +1 -0
package/dist/wizard/copy/index.js +32 -0
package/dist/wizard/copy/index.js.map +1 -0
package/dist/wizard/copy/library.d.ts +3 -0
package/dist/wizard/copy/library.d.ts.map +1 -0
package/dist/wizard/copy/library.js +44 -0
package/dist/wizard/copy/library.js.map +1 -0
package/dist/wizard/copy/ml.d.ts +3 -0
package/dist/wizard/copy/ml.d.ts.map +1 -0
package/dist/wizard/copy/ml.js +45 -0
package/dist/wizard/copy/ml.js.map +1 -0
package/dist/wizard/copy/mobile-app.d.ts +3 -0
package/dist/wizard/copy/mobile-app.d.ts.map +1 -0
package/dist/wizard/copy/mobile-app.js +45 -0
package/dist/wizard/copy/mobile-app.js.map +1 -0
package/dist/wizard/copy/types.d.ts +60 -0
package/dist/wizard/copy/types.d.ts.map +1 -0
package/dist/wizard/copy/types.js +2 -0
package/dist/wizard/copy/types.js.map +1 -0
package/dist/wizard/copy/types.test-d.d.ts +2 -0
package/dist/wizard/copy/types.test-d.d.ts.map +1 -0
package/dist/wizard/copy/types.test-d.js +36 -0
package/dist/wizard/copy/types.test-d.js.map +1 -0
package/dist/wizard/copy/web-app.d.ts +3 -0
package/dist/wizard/copy/web-app.d.ts.map +1 -0
package/dist/wizard/copy/web-app.js +46 -0
package/dist/wizard/copy/web-app.js.map +1 -0
package/dist/wizard/questions.d.ts.map +1 -1
package/dist/wizard/questions.js +87 -53
package/dist/wizard/questions.js.map +1 -1
package/dist/wizard/questions.test.js +3 -2
package/dist/wizard/questions.test.js.map +1 -1
package/dist/wizard/wizard.test.js +70 -0
package/dist/wizard/wizard.test.js.map +1 -1
package/package.json +1 -1
package/skills/multi-model-dispatch/SKILL.md +20 -22

package/README.md CHANGED Viewed

@@ -10,7 +10,7 @@ Scaffold is a composable meta-prompt pipeline built for [Claude Code](https://do
 Here's how it works:
-1. **Initialize** — run `scaffold init` in your project directory. The init wizard detects whether you're starting fresh (greenfield) or working with an existing codebase (brownfield), and lets you pick a methodology preset (deep, mvp, or custom).
+1. **Initialize** — run `scaffold init` in your project directory. The init wizard detects whether you're starting fresh (greenfield) or working with an existing codebase (brownfield), and lets you pick a methodology preset (deep, mvp, or custom). Every question shows inline descriptions and friendly labels — type `?` at any choice prompt for detailed help.
 2. **Run steps** — each step is a composable meta-prompt (a short intent declaration in `content/pipeline/`) that gets assembled at runtime into a full 7-section prompt. The assembly engine injects relevant knowledge base entries, project context from prior steps, methodology settings, and depth-appropriate instructions.
@@ -38,7 +38,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
 **Depth scale** (1-5) — Controls how thorough each step's output is, from "focus on the core deliverable" (1) to "explore all angles, tradeoffs, and edge cases" (5). Depth resolves with 4-level precedence: CLI flag > step override > custom default > preset default.
-**Multi-model validation** — At depth 4-5, all 19 review and validation steps can dispatch independent reviews to Codex and/or Gemini CLIs. Two independent models catch more blind spots than one. When both CLIs are available, findings are reconciled by confidence level (both agree = high confidence, single model P0 = still actionable). Auth is verified before every dispatch (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok"`). See the [Multi-Model Review](#multi-model-review) section.
+**Multi-model validation** — At depth 4-5, all 19 review and validation steps can dispatch independent reviews to Codex and/or Gemini CLIs. Two independent models catch more blind spots than one. When both CLIs are available, findings are reconciled by confidence level (both agree = high confidence, single model P0 = still actionable). When a channel is unavailable, a compensating Claude self-review pass runs in its place (labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`, single-source confidence). CLI commands must always run in the foreground — background execution produces empty output. See the [Multi-Model Review](#multi-model-review) section.
 **State management** — Pipeline progress is tracked in `.scaffold/state.json` with atomic file writes and crash recovery. An advisory lock prevents concurrent runs. Decisions are logged to an append-only `decisions.jsonl`.
@@ -217,7 +217,7 @@ git init
 scaffold init
 ```
-The init wizard detects that this is a brand new project and asks you to pick a methodology. Choose **mvp** if you want to get to working code fast — it runs only 7 critical steps instead of the full 60. You can always switch to `deep` or `custom` later.
+The init wizard detects that this is a brand new project and walks you through setup with friendly labels and inline descriptions for every option. Type `?` at any choice prompt for detailed guidance. Choose **mvp** if you want to get to working code fast — it runs only 7 critical steps instead of the full 60. You can always switch to `deep` or `custom` later.
 **Open Claude Code in your project directory**, then start talking:
@@ -931,12 +931,13 @@ mmr review --pr 47           ──→  Dispatches to all channels in background
                                    Agent continues working
 mmr status mmr-a1b2c3        ──→  Poll progress (which channels done?)
-                                   Exit code: 0=done, 1=running, 2=failed
+                                   Exit code: 0=done, 1=running, 4=failed
 mmr results mmr-a1b2c3       ──→  Reconcile findings across channels
+                                   Run compensating passes for unavailable channels
                                    Apply severity gate
                                    Output unified findings
-                                   Exit code: 0=passed, 1=gate failed
+                                   Exit code: 0=passed, 2=gate failed, 3=degraded
 ```
 **Key features:**

package/content/knowledge/core/automated-review-tooling.md CHANGED Viewed

@@ -10,194 +10,191 @@ Automated PR review leverages AI models to provide consistent, thorough code rev
 ## Summary
-### Architecture: Local CLI Review
+### Review Severity and Reconciliation
-The scaffold approach uses local CLI review rather than GitHub Actions:
-- **No CI secrets required** — models run locally via CLI tools
-- **Dual-model review** — run Codex and Gemini (when available) for independent perspectives
-- **Agent-managed loop** — Claude orchestrates the review-fix cycle locally
+See `review-methodology` for severity definitions (P0-P3). See `multi-model-review-dispatch` for finding reconciliation rules.
-Components:
-- `AGENTS.md` — reviewer instructions with project-specific rules
-- `docs/review-standards.md` — severity definitions (P0-P3) and criteria
-- `scripts/cli-pr-review.sh` — dual-model review script
-- `scripts/await-pr-review.sh` — polling script for external bot mode
+**Action thresholds:** P0/P1/P2 findings must be fixed before proceeding to the next task. P3 findings are recorded but not actioned.
-### Review Severity Levels
+### Degraded-Mode Behavior
-Consistent with the pipeline's review step severity:
-- **P0 (blocking)** — must fix before merge (security, data loss, broken functionality)
-- **P1 (important)** — should fix before merge (bugs, missing tests, performance)
-- **P2 (suggestion)** — consider fixing (style, naming, documentation)
-- **P3 (nit)** — optional (personal preference, minor optimization)
+#### Verdict Definitions
-### Dual-Model Review Pattern
+These are the authoritative verdict definitions. Tool files (`review-code.md`, `review-pr.md`) reference these.
-When both Codex CLI and Gemini CLI are available:
-1. Run both reviewers independently on the PR diff
-2. Collect findings from each
-3. Reconcile: consensus findings get higher confidence
-4. Disagreements are flagged for the implementing agent to resolve
+| Verdict | Condition |
+|---------|-----------|
+| `pass` | All configured channels ran, no unresolved P0/P1/P2 |
+| `degraded-pass` | Channels skipped, compensated, or have non-full coverage (e.g., partial timeout), no unresolved P0/P1/P2 |
+| `blocked` | Unresolved P0/P1/P2 after 3 fix rounds |
+| `needs-user-decision` | Contradictions or unresolvable findings |
-### Integration with PR Workflow
+**Verdict precedence:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply, the higher-precedence verdict wins.
-The review step integrates into the standard PR flow:
-1. Agent creates PR
-2. Agent runs `scripts/cli-pr-review.sh` (or review runs automatically)
-3. Review findings are posted as PR comments or written to a local file
-4. Agent addresses P0/P1/P2 findings, pushes fixes
-5. Re-review until no P0/P1/P2 findings remain
-6. PR is ready for merge
+**Both external channels missing:** Maximum achievable verdict is `degraded-pass` — never `pass`. Review summary must note: "All findings are single-model (Claude only). External validation was unavailable."
-## Deep Guidance
+#### Status Model
-### AGENTS.md Structure
+`compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `auth_timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `auth_timeout`). The report uses the **coverage label** to show the reader what ran.
-The `AGENTS.md` file provides reviewer instructions:
+#### Compensating Passes
-```markdown
-# Code Review Instructions
+When an external channel (Codex or Gemini) is unavailable, run a compensating Claude self-review pass:
-## Project Context
-[Brief description of what this project does]
+- Same prompt structure as the missing channel, executed as a Claude self-review pass.
+- Labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]` in the review summary.
+- Missing Codex → focus on implementation correctness, security, API contracts.
+- Missing Gemini → focus on architectural patterns, design reasoning, broad context.
+- Missing both → two compensating passes (one per missing channel's strength area).
+- Compensating-pass findings are **single-source confidence** — they do NOT raise to high confidence even if they agree with another channel's findings.
+- Normal mandatory-fix thresholds apply: P0/P1/P2 findings from compensating passes still require fixing.
-## Review Focus Areas
-- Security: [project-specific security concerns]
-- Performance: [known hot paths or constraints]
-- Testing: [coverage requirements, test patterns]
+**Superpowers channel:** No compensating pass needed — Superpowers is a Claude subagent and is always available. If the Superpowers plugin is not installed, run available external CLIs and warn the user that review coverage is reduced.
-## Coding Standards Reference
-See docs/coding-standards.md for:
-- Naming conventions
-- Error handling patterns
-- Logging standards
+#### Foreground-Only Execution
-## Known Patterns
-[Project-specific patterns reviewers should enforce]
+Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
-## Out of Scope
-[Things reviewers should NOT flag]
-```
+This constraint is intentionally duplicated from `multi-model-review-dispatch`. Knowledge entries are injected independently by the assembly engine — an agent may receive this entry without `multi-model-review-dispatch`, so both need the constraint.
+## Deep Guidance
+### Finding Reconciliation
+After all channels complete (including compensating passes), reconcile findings using the rules in `multi-model-review-dispatch`. This orchestration entry triggers reconciliation; the dispatch entry defines how to perform it.
-### CLI Review Script Pattern
+Reconciliation normalizes findings from all channels (real and compensating) to a common schema, then matches findings across channels by location and category. The purpose is to detect when multiple independent channels agree on a finding (raising confidence) and to surface contradictions that require human judgment. A finding reported by Codex alone has lower confidence than the same finding reported by both Codex and Gemini.
-The `cli-pr-review.sh` script follows this structure:
+The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action at P0/P1/P2 but should be noted as lower-confidence in the review summary.
+Findings that appear in all three channels (Codex, Gemini, Superpowers) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
 ```bash
-#!/usr/bin/env bash
-set -euo pipefail
-# 1. Get the PR diff
-diff=$(gh pr diff "$PR_NUMBER")
-# 2. Run Codex review (if available)
-if command -v codex &>/dev/null; then
-  codex_findings=$(echo "$diff" | codex review --context AGENTS.md)
-fi
-# 3. Run Gemini review (if available)
-if command -v gemini &>/dev/null; then
-  gemini_findings=$(echo "$diff" | gemini review --context AGENTS.md)
-fi
-# 4. Reconcile findings
-# - Findings from both models: HIGH confidence
-# - Findings from one model: MEDIUM confidence
-# - Contradictions: flagged for human review
+# Orchestration reconciliation workflow
+# 1. Collect findings from all channels (real + compensating)
+# 2. Normalize to common schema (severity, category, location, description)
+# 3. Match findings across channels by location + category
+# 4. Apply consensus rules from multi-model-review-dispatch
+# 5. Produce reconciled findings list with confidence scores
 ```
-### Review Standards Document
+### Channel Dispatch Pattern and Orchestration
-`docs/review-standards.md` should define:
-- Severity levels with concrete examples per project
-- What constitutes a blocking review (P0/P1/P2 threshold)
-- Auto-approve criteria (when review can be skipped)
-- Review SLA (how long before auto-approve kicks in)
+Each external channel (Codex, Gemini) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass, and continue to the next channel. The Superpowers channel is always available as a Claude subagent and does not require installation or auth checks.
-### Fallback When Models Unavailable
+```bash
+# Channel dispatch pattern
+# For each external channel (Codex, Gemini):
+#   1. command -v <tool> >/dev/null 2>&1 || { status=not_installed; queue_compensating; continue; }
+#   2. <auth_check> || { status=auth_failed; queue_compensating; continue; }
+#   3. <dispatch_foreground> || { status=failed; queue_compensating; continue; }
+# For Superpowers: dispatch subagent (always available)
+# After all: run queued compensating passes → reconcile → verdict
+```
+After all channels and compensating passes complete, run the reconciliation workflow above and apply the verdict decision flow. Channel results and compensating-pass labels must be preserved in the review output for auditability — do not collapse or omit them even when findings are empty.
+### Degraded-Mode Worked Example
-If neither Codex nor Gemini CLI is available:
-1. Claude performs an enhanced self-review of the diff
-2. Focus on the AGENTS.md review criteria
-3. Apply the same severity classification
-4. Document that the review was single-model
+When Codex is unavailable (not installed or auth failure), the orchestration proceeds as follows:
-### Updating Review Standards Over Time
+1. The installation check (`command -v codex`) fails. Codex channel status is set to `not_installed`.
+2. A compensating Codex-equivalent pass is queued: a Claude self-review focused on implementation correctness, security, and API contracts.
+3. Gemini and Superpowers channels run normally.
+4. The compensating pass runs, producing findings labeled `[compensating: Codex-equivalent]`.
+5. Reconciliation merges findings from all three sources (Gemini, Superpowers, compensating-Codex).
+6. Maximum achievable verdict is `degraded-pass` because a real channel was absent.
+7. The review summary notes: "Codex channel: not_installed (compensating: Codex-equivalent pass ran)."
-As the project evolves:
-- Add new review focus areas when new patterns emerge
-- Remove rules that linters now enforce automatically
-- Update AGENTS.md when architecture changes
-- Track false-positive rates and adjust thresholds
+**Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `auth_timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
-### Review Finding Reconciliation
+### Verdict Decision Flow
-When running dual-model review, reconcile findings systematically:
+Apply the following evaluation order to determine the final verdict. The first matching condition wins; all subsequent conditions are skipped.
 ```
-Finding Classification:
-┌─────────────────┬──────────┬──────────┬───────────────────┐
-│                 │ Codex    │ Gemini   │ Action            │
-├─────────────────┼──────────┼──────────┼───────────────────┤
-│ Same issue      │ Found    │ Found    │ HIGH confidence   │
-│ Unique finding  │ Found    │ -        │ MEDIUM confidence │
-│ Unique finding  │ -        │ Found    │ MEDIUM confidence │
-│ Contradiction   │ Fix X    │ Keep X   │ Flag for agent    │
-└─────────────────┴──────────┴──────────┴───────────────────┘
+Verdict evaluation order:
+1. Any contradictions or unresolvable findings? → needs-user-decision
+2. Any unresolved P0/P1/P2 after 3 fix rounds? → blocked
+3. Any channel not at full coverage? → degraded-pass
+4. All channels completed, no unresolved P0/P1/P2? → pass
 ```
-HIGH confidence findings are always addressed. MEDIUM confidence findings are addressed if P0/P1/P2. Contradictions require the implementing agent to make a judgment call and document the reasoning.
+A "contradiction" exists when two channels report opposite conclusions about the same code location — for example, Codex flags a function as insecure while Gemini explicitly approves it. Contradictions cannot be resolved by the agent alone and must be surfaced to the user.
+A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, it timed out partially, or the Superpowers plugin is not installed and available channels do not cover the full diff.
+**Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. If multiple conditions apply simultaneously (for example, both a contradiction and an unresolved P0 exist), the higher-precedence verdict wins.
+The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings and no contradictions remain, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
 ### Security-Focused Review Checklist
 Every automated review should check:
-- No secrets or credentials in the diff (API keys, passwords, tokens)
-- No `eval()` or equivalent unsafe operations introduced
-- SQL queries use parameterized queries (no string concatenation)
-- User input is validated before use
-- Authentication/authorization checks are present on new endpoints
-- Dependencies added are from trusted sources with known versions
+- No secrets or credentials in the diff (API keys, passwords, tokens, private keys)
+- No `eval()` or equivalent unsafe operations introduced (dynamic code execution, shell injection)
+- SQL queries use parameterized queries — no string concatenation with user input
+- User input is validated and sanitized before use in queries, commands, or output
+- Authentication/authorization checks are present on all new endpoints and operations
+- Dependencies added are from trusted sources with known, pinned versions
+- No new global state or singletons that could cause cross-request data leaks
+- Error messages do not expose internal paths, stack traces, or sensitive system details
+- File system operations use safe path handling (no path traversal vulnerabilities)
+- Cryptographic operations use approved algorithms and key lengths
+When reviewing diffs that touch authentication, authorization, or data handling, elevate any security-related finding by one severity level. A finding that would normally be P2 (recommended) becomes P1 (required) in security-sensitive code paths. This conservative stance reflects the asymmetric cost of security failures versus the cost of over-caution during review.
 ### Performance Review Patterns
-Look for these performance anti-patterns:
-- N+1 queries (loop with individual DB calls)
-- Missing pagination on list endpoints
-- Synchronous operations that should be async
-- Large objects passed by value instead of reference
-- Missing caching for expensive computations
-- Unbounded growth in arrays or maps
+Look for these performance anti-patterns in the diff:
+- N+1 queries (loop containing individual DB calls — use batch queries or eager loading)
+- Missing pagination on list endpoints (unbounded result sets)
+- Synchronous operations that should be async (blocking I/O in hot paths)
+- Large objects passed by value instead of reference (unnecessary deep copies)
+- Missing caching for expensive computations that are called repeatedly
+- Unbounded growth in arrays or maps (no eviction, no size limits)
+- Missing indexes on columns used in WHERE clauses of new queries
+- Eager loading where lazy loading would suffice (over-fetching)
+- Missing connection pooling or connection reuse for external services
-### Integration with CLAUDE.md
+### Common False Positives
-The workflow-audit step should add review commands to CLAUDE.md:
+Track and suppress recurring false positives to reduce noise in future reviews:
+- Test files flagged for "hardcoded values" (test fixtures and expected values are intentional)
+- Migration files flagged for "raw SQL" (migrations must use raw SQL for schema changes)
+- Generated files flagged for style issues (generated code follows its own generator's conventions)
+- Intentional use of `any` types in TypeScript adapter layers or third-party type overrides
+- Deliberate `eslint-disable` comments that are already justified in surrounding context
+- Seed data files flagged for hardcoded credentials (test-only, not production)
-```markdown
-## Code Review
-| Command | Purpose |
-|---------|---------|
-| `scripts/cli-pr-review.sh <PR#>` | Run dual-model review |
-| `scripts/await-pr-review.sh <PR#>` | Poll for external review |
-```
+Add suppressions to AGENTS.md under "Out of Scope" to prevent repeated false findings across review cycles.
-This ensures agents always know how to trigger reviews without consulting separate docs.
+### Review Metrics and Continuous Improvement
-### Common False Positives
+Track these metrics over time to improve review quality and calibrate thresholds:
-Track and suppress recurring false positives:
-- Test files flagged for "hardcoded values" (test fixtures are intentional)
-- Migration files flagged for "raw SQL" (migrations must use raw SQL)
-- Generated files flagged for style issues (generated code has its own conventions)
+| Metric | Definition | Use |
+|--------|------------|-----|
+| False positive rate | Findings dismissed without action / total findings | Calibrate severity thresholds |
+| Escape rate | Bugs reaching production despite review / total bugs | Identify coverage gaps |
+| Time to resolve | Average time between finding logged and fix merged | Identify bottlenecks |
+| Coverage | PRs receiving automated review / total PRs merged | Track adoption |
+| Model agreement rate | Findings agreed by 2+ channels / total findings | Tune reconciliation rules |
+| Compensating-pass rate | Reviews using compensating passes / total reviews | Track environment health |
-Add suppressions to AGENTS.md under "Out of Scope" to prevent repeated false findings.
+Use the false positive rate to determine whether a severity category is over-triggering. Use the escape rate to determine whether the review is missing entire classes of bugs. Use the compensating-pass rate to identify when the review environment needs maintenance (expired auth tokens, broken CLI installs).
-### Review Metrics and Continuous Improvement
+Log metric snapshots in AGENTS.md after each major project milestone. A declining model agreement rate over time suggests either that the review prompts are drifting in quality or that the codebase is accumulating technical debt in areas where models diverge. A rising escape rate despite consistent review coverage is a signal to revisit the severity thresholds or the focus areas in the review prompts.
+### Fallback When Models Unavailable
+When external CLIs are unavailable, the degraded-mode behavior defined in the Summary section applies. To summarize the operational steps:
-Track these metrics over time to improve review quality:
-- **False positive rate** — findings that are dismissed without action
-- **Escape rate** — bugs that reach production despite review
-- **Time to resolve** — average time between finding and fix
-- **Coverage** — percentage of PRs that receive automated review
-- **Model agreement rate** — how often Codex and Gemini agree
+1. For each unavailable external channel, queue a compensating Claude self-review pass focused on that channel's strength area.
+2. Label findings as `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`.
+3. Treat compensating findings as single-source confidence — they do not raise to high confidence even when they agree with another channel.
+4. Maximum verdict is `degraded-pass` when any channel ran as compensating instead of real.
+5. When both external channels are unavailable, note "All findings are single-model (Claude only). External validation was unavailable." in the review summary.
+6. Never silently drop unavailable channels — always record the channel status and compensating coverage label in the review output.
-Use these metrics to calibrate severity thresholds and update AGENTS.md focus areas.
+**Superpowers channel exception:** Superpowers is a Claude subagent and requires no external CLI or auth. It is always available as long as the Superpowers plugin is installed in the Claude Code environment. If the plugin is not installed, run available external CLIs and warn the user that review coverage is reduced — but do not run a compensating pass for Superpowers (the compensating-pass mechanism only applies to external CLIs that have an installation/auth gate).

package/content/knowledge/core/multi-model-research-dispatch.md CHANGED Viewed

@@ -18,12 +18,13 @@ At higher methodology depths (4+), idea exploration and adversarial challenge be
 | 5 | Multi-model with reconciliation | Multi-model with reconciliation |
 ### Graceful Fallback Chain
-1. Check if external CLI is available (`which codex`, `which gemini`)
-2. If available, check auth (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok" -o json`)
-3. If auth succeeds, dispatch with timeout
-4. If CLI unavailable or auth fails, skip that model — note in Session Metadata
-5. If no external models available, fall back to primary model with distinct framing prompts
-6. Never block the session waiting for unavailable tools
+1. Check if external CLI is available (`command -v codex`, `command -v gemini`)
+2. If not installed, skip that model silently — note in Session Metadata
+3. If installed, check auth (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok" -o json`)
+4. If auth fails, surface loudly to the user with `!` recovery command — do NOT silently skip
+5. If auth succeeds, dispatch with timeout
+6. If no external models available, fall back to primary model with distinct framing prompts
+7. Never block the session waiting for unavailable tools
 ### Reconciliation Rules
 - **2+ models agree** on the same finding = **consensus** — high confidence, present as validated
@@ -36,19 +37,29 @@ At higher methodology depths (4+), idea exploration and adversarial challenge be
 Before dispatching, verify CLI tools are installed and authenticated:
-```bash
-# Codex CLI
-which codex >/dev/null 2>&1 && codex login status 2>/dev/null
-# Exit 0 = ready. Non-zero = skip Codex.
+### Foreground-Only Execution
+When an AI agent dispatches research or challenge prompts via a tool runner, always run commands in the foreground. Background execution (`run_in_background`, `&`, `nohup`) produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
-# Gemini CLI
-which gemini >/dev/null 2>&1 && NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
-# Check for "ok" in response. Exit 41 = auth failure.
+```bash
+# Codex CLI — step 1: check installed
+command -v codex >/dev/null 2>&1 || { echo "Codex not installed — skipping"; exit 0; }
+# step 2: check auth
+codex login status 2>/dev/null
+# Exit 0 = ready. Non-zero = auth failure (surface to user).
+# Gemini CLI — step 1: check installed
+command -v gemini >/dev/null 2>&1 || { echo "Gemini not installed — skipping"; exit 0; }
+# step 2: check auth
+NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
+# Check for "ok" in response. Exit 41 = auth failure (surface to user).
 ```
-If auth fails, tell the user which tool failed and how to fix it:
-- Codex: "Codex auth expired — run `! codex login` to re-authenticate"
-- Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
+Two distinct failure modes:
+- **Not installed** (`command -v` fails): skip silently, note in Session Metadata
+- **Auth failed** (non-zero after install check): surface loudly — tell the user which tool failed and how to fix it:
+  - Codex: "Codex auth expired — run `! codex login` to re-authenticate"
+  - Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
 Auth failures are NOT silent fallbacks — surface them explicitly.

package/content/knowledge/core/multi-model-review-dispatch.md CHANGED Viewed

@@ -44,21 +44,53 @@ The review never blocks on external model availability.
 ## Deep Guidance
+See `review-methodology` for severity definitions (P0-P3). This entry uses those severities but does not define them.
 ### Dispatch Mechanics
+#### Foreground-Only Execution
+When an AI agent dispatches CLI reviews via a tool runner (Claude Code Bash tool, Codex exec, etc.), always run commands in the foreground. Background execution (`run_in_background`, `&`, `nohup`) produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
 #### CLI Availability Check
-Before dispatching, verify the model CLI is installed and authenticated:
+Before dispatching, verify the model CLI is installed and authenticated using a two-step process that produces distinct statuses for the orchestration layer:
+**Step 1 — Installation check:**
+```bash
+# Codex: not found -> status: "not_installed"
+command -v codex >/dev/null 2>&1
+# Gemini: not found -> status: "not_installed"
+command -v gemini >/dev/null 2>&1
+```
+If the CLI is not found, report status `not_installed` to the orchestration layer. Do not prompt the user to install it.
+**Step 2 — Auth verification (only if installed):**
 ```bash
-# Codex check
-which codex && codex --version 2>/dev/null
+# Codex: fail -> status: "auth_failed"
+codex login status 2>/dev/null
-# Gemini check (via Google Cloud CLI or dedicated tool)
-which gemini 2>/dev/null || (which gcloud && gcloud ai models list 2>/dev/null)
+# Gemini: exit 41 -> status: "auth_failed"
+NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
 ```
-If the CLI is not found, skip dispatch immediately. Do not prompt the user to install it — this is a review enhancement, not a requirement.
+If auth fails, report status `auth_failed` and surface recovery to the user:
+- Codex: "Codex auth expired — run `! codex login` to re-authenticate"
+- Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
+If auth check times out (~5 seconds), retry once. If still failing, report `auth_timeout`.
+If auth succeeds, report `ready` and proceed to dispatch.
+**Post-dispatch terminal states:**
+- `completed` — channel produced results, use normally
+- `partial_timeout` — partial output before timeout; use what was received, note incompleteness. Does NOT trigger compensating pass.
+- `failed` — crashed or unparseable output; triggers compensating pass.
+Verdict impact: `partial_timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
 #### Prompt Formatting
@@ -241,6 +273,15 @@ Minimum standards for a multi-model review to be considered complete:
 If the primary Claude review produces zero findings and external models are unavailable, the review should explicitly note this as unusual and recommend a targeted re-review at a later stage.
+#### Degraded-Mode Gate Adaptation
+When channels are skipped and compensating passes are used:
+- **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
+- **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
+- **Coverage threshold** gate: compensating passes satisfy the "every pass has at least one finding or explicit no-issues note" requirement.
+- The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
 ### Common Anti-Patterns
 **Blind trust of external findings.** An external model flags an issue and the reviewer includes it without verification. External models hallucinate — they may flag a "missing section" that actually exists, or cite a "contradiction" based on a misread. Fix: every external finding must be verified against the actual artifact before inclusion in the final report.

package/content/skills/multi-model-dispatch/SKILL.md CHANGED Viewed

@@ -64,7 +64,7 @@ fi
 The `!` prefix runs the command in the user's terminal session, allowing interactive auth flows (browser OAuth, Y/n prompts) that can't work in headless mode.
-**If neither CLI is available or authenticated**: Fall back to structured Claude-only self-review. Re-read the artifact with an adversarial lens — actively try to find issues the initial review missed. Document this as "single-model review (no external CLIs available)."
+**If neither CLI is available or authenticated**: Queue a compensating Claude pass focused on the failed channel's strength area. Document this as "single-model review (no external CLIs available)."
 ## Correct Invocation Patterns
@@ -134,6 +134,12 @@ NO_BROWSER=true gemini -p "REVIEW_PROMPT_HERE" --output-format json -s --approva
 **Output**: JSON on stdout with `{ response, stats, error }` structure.
+## Foreground-Only Execution
+Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from both CLIs. Multiple foreground calls in a single message are fine — the tool runner supports parallel invocations.
+This means: when dispatching reviews, make each CLI call a separate foreground Bash tool invocation. Do NOT use shell `&` or background subshells.
 ## Context Bundling
 When dispatching a review, bundle all relevant context into the prompt. Each CLI gets the same bundle — do NOT share one model's review with the other.
@@ -208,35 +214,27 @@ You are reviewing a pull request diff. Report P0, P1, and P2 issues.
 | PR diff | Full diff | If >2000 lines, split into file groups |
 | Implementation plan | Task list + representative tasks | Include full task list, detail for flagged tasks |
-## Dual-Model Reconciliation
-When both CLIs produce results, reconcile findings using these rules:
+## Finding Reconciliation
-| Scenario | Confidence | Action |
-|----------|-----------|--------|
-| Both flag same issue | **High** | Fix immediately — two independent models agree |
-| Both approve (no findings) | **High** | Proceed confidently |
-| One flags P0, other approves | **High** | Fix it — P0 is critical enough from a single source |
-| One flags P1, other approves | **Medium** | Review the finding carefully before fixing. If the finding is specific and actionable, fix it. If vague, skip. |
-| Models contradict each other | **Low** | Present both findings to the user for adjudication |
+When multiple models produce findings, reconcile them using the rules defined in `multi-model-review-dispatch`. Key principles:
-**Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
+- **Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
+- **Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds with unresolved findings, stop and surface the verdict (`blocked` or `needs-user-decision`) to the user. Do NOT auto-merge.
-**Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds, merge with a warning and create a follow-up issue for remaining findings.
+For the full consensus rules, confidence scoring, and disagreement resolution process, see `multi-model-review-dispatch`.
 ## Fallback Behavior
 | Situation | Fallback |
 |-----------|----------|
-| Neither CLI available | Structured Claude-only adversarial self-review |
-| Codex only | Single-model review with Codex |
-| Gemini only | Single-model review with Gemini |
-| **CLI auth expired** | **Surface to user with recovery command — do NOT silently fall back** |
-| One CLI fails mid-review (non-auth) | Continue with the other; note the failure in summary |
-| Both CLIs fail (non-auth) | Fall back to Claude-only self-review; warn user |
-| CLI output not parseable as JSON | Treat as text, extract findings manually |
-**Auth failures are NOT silent fallbacks.** The difference between "CLI not installed" (fall back quietly) and "CLI auth expired" (user action required) is critical. Auth can be fixed in 30 seconds with an interactive command — silently skipping wastes the user's review infrastructure.
+| Neither CLI available | Queue two compensating Claude passes (one per missing channel's strength area). Label findings. Max verdict: `degraded-pass`. |
+| Codex only | Single-model review with Codex + compensating Claude pass for Gemini |
+| Gemini only | Single-model review with Gemini + compensating Claude pass for Codex |
+| **CLI auth expired** | **Surface to user with `!` recovery command — do NOT silently fall back** |
+| One CLI fails mid-review | Use partial results if available, else queue compensating pass. Note failure in summary. |
+| Both CLIs fail | Two compensating passes, max verdict: `degraded-pass`. Warn user. |
+Auth failures are NOT silent fallbacks.
 ## Integration with Review Steps