@zigrivers/scaffold 3.14.0 → 3.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. package/README.md +50 -21
  2. package/content/knowledge/core/automated-review-tooling.md +21 -26
  3. package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
  4. package/content/knowledge/research/research-architecture.md +385 -0
  5. package/content/knowledge/research/research-conventions.md +248 -0
  6. package/content/knowledge/research/research-dev-environment.md +303 -0
  7. package/content/knowledge/research/research-experiment-loop.md +429 -0
  8. package/content/knowledge/research/research-experiment-tracking.md +336 -0
  9. package/content/knowledge/research/research-ml-architecture-search.md +383 -0
  10. package/content/knowledge/research/research-ml-evaluation.md +407 -0
  11. package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
  12. package/content/knowledge/research/research-ml-training-patterns.md +413 -0
  13. package/content/knowledge/research/research-observability.md +395 -0
  14. package/content/knowledge/research/research-overfitting-prevention.md +306 -0
  15. package/content/knowledge/research/research-project-structure.md +264 -0
  16. package/content/knowledge/research/research-quant-backtesting.md +326 -0
  17. package/content/knowledge/research/research-quant-market-data.md +366 -0
  18. package/content/knowledge/research/research-quant-metrics.md +335 -0
  19. package/content/knowledge/research/research-quant-requirements.md +223 -0
  20. package/content/knowledge/research/research-quant-risk.md +469 -0
  21. package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
  22. package/content/knowledge/research/research-requirements.md +201 -0
  23. package/content/knowledge/research/research-security.md +374 -0
  24. package/content/knowledge/research/research-sim-compute-management.md +538 -0
  25. package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
  26. package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
  27. package/content/knowledge/research/research-sim-validation.md +456 -0
  28. package/content/knowledge/research/research-testing.md +334 -0
  29. package/content/methodology/research-ml-research.yml +23 -0
  30. package/content/methodology/research-overlay.yml +65 -0
  31. package/content/methodology/research-quant-finance.yml +29 -0
  32. package/content/methodology/research-simulation.yml +23 -0
  33. package/content/tools/post-implementation-review.md +36 -7
  34. package/content/tools/review-code.md +33 -8
  35. package/content/tools/review-pr.md +79 -95
  36. package/dist/cli/commands/adopt.d.ts.map +1 -1
  37. package/dist/cli/commands/adopt.js +22 -1
  38. package/dist/cli/commands/adopt.js.map +1 -1
  39. package/dist/cli/commands/adopt.serialization.test.js +41 -0
  40. package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
  41. package/dist/cli/commands/init.d.ts +4 -0
  42. package/dist/cli/commands/init.d.ts.map +1 -1
  43. package/dist/cli/commands/init.js +32 -2
  44. package/dist/cli/commands/init.js.map +1 -1
  45. package/dist/cli/init-flag-families.d.ts +6 -1
  46. package/dist/cli/init-flag-families.d.ts.map +1 -1
  47. package/dist/cli/init-flag-families.js +32 -1
  48. package/dist/cli/init-flag-families.js.map +1 -1
  49. package/dist/cli/init-flag-families.test.js +47 -0
  50. package/dist/cli/init-flag-families.test.js.map +1 -1
  51. package/dist/config/schema.d.ts +272 -16
  52. package/dist/config/schema.d.ts.map +1 -1
  53. package/dist/config/schema.js +25 -1
  54. package/dist/config/schema.js.map +1 -1
  55. package/dist/config/schema.test.js +103 -3
  56. package/dist/config/schema.test.js.map +1 -1
  57. package/dist/core/assembly/overlay-loader.d.ts +12 -0
  58. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  59. package/dist/core/assembly/overlay-loader.js +30 -0
  60. package/dist/core/assembly/overlay-loader.js.map +1 -1
  61. package/dist/core/assembly/overlay-loader.test.js +66 -1
  62. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  63. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  64. package/dist/core/assembly/overlay-state-resolver.js +48 -19
  65. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  66. package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
  67. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  68. package/dist/e2e/project-type-overlays.test.js +119 -0
  69. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  70. package/dist/project/adopt.d.ts.map +1 -1
  71. package/dist/project/adopt.js +3 -1
  72. package/dist/project/adopt.js.map +1 -1
  73. package/dist/project/detectors/disambiguate.js +1 -1
  74. package/dist/project/detectors/disambiguate.js.map +1 -1
  75. package/dist/project/detectors/index.d.ts.map +1 -1
  76. package/dist/project/detectors/index.js +2 -1
  77. package/dist/project/detectors/index.js.map +1 -1
  78. package/dist/project/detectors/ml.d.ts.map +1 -1
  79. package/dist/project/detectors/ml.js +2 -6
  80. package/dist/project/detectors/ml.js.map +1 -1
  81. package/dist/project/detectors/research.d.ts +4 -0
  82. package/dist/project/detectors/research.d.ts.map +1 -0
  83. package/dist/project/detectors/research.js +141 -0
  84. package/dist/project/detectors/research.js.map +1 -0
  85. package/dist/project/detectors/research.test.d.ts +2 -0
  86. package/dist/project/detectors/research.test.d.ts.map +1 -0
  87. package/dist/project/detectors/research.test.js +235 -0
  88. package/dist/project/detectors/research.test.js.map +1 -0
  89. package/dist/project/detectors/shared-signals.d.ts +3 -0
  90. package/dist/project/detectors/shared-signals.d.ts.map +1 -0
  91. package/dist/project/detectors/shared-signals.js +9 -0
  92. package/dist/project/detectors/shared-signals.js.map +1 -0
  93. package/dist/project/detectors/types.d.ts +6 -2
  94. package/dist/project/detectors/types.d.ts.map +1 -1
  95. package/dist/project/detectors/types.js.map +1 -1
  96. package/dist/types/config.d.ts +7 -1
  97. package/dist/types/config.d.ts.map +1 -1
  98. package/dist/wizard/copy/core.d.ts.map +1 -1
  99. package/dist/wizard/copy/core.js +4 -0
  100. package/dist/wizard/copy/core.js.map +1 -1
  101. package/dist/wizard/copy/index.d.ts.map +1 -1
  102. package/dist/wizard/copy/index.js +2 -0
  103. package/dist/wizard/copy/index.js.map +1 -1
  104. package/dist/wizard/copy/research.d.ts +3 -0
  105. package/dist/wizard/copy/research.d.ts.map +1 -0
  106. package/dist/wizard/copy/research.js +27 -0
  107. package/dist/wizard/copy/research.js.map +1 -0
  108. package/dist/wizard/copy/types.d.ts +5 -1
  109. package/dist/wizard/copy/types.d.ts.map +1 -1
  110. package/dist/wizard/flags.d.ts +7 -1
  111. package/dist/wizard/flags.d.ts.map +1 -1
  112. package/dist/wizard/questions.d.ts +4 -2
  113. package/dist/wizard/questions.d.ts.map +1 -1
  114. package/dist/wizard/questions.js +27 -1
  115. package/dist/wizard/questions.js.map +1 -1
  116. package/dist/wizard/questions.test.js +51 -0
  117. package/dist/wizard/questions.test.js.map +1 -1
  118. package/dist/wizard/wizard.d.ts +3 -2
  119. package/dist/wizard/wizard.d.ts.map +1 -1
  120. package/dist/wizard/wizard.js +3 -1
  121. package/dist/wizard/wizard.js.map +1 -1
  122. package/package.json +1 -1
package/README.md CHANGED
@@ -29,7 +29,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
29
29
 
30
30
  **Assembly engine** — At execution time, Scaffold builds a 7-section prompt from: system metadata, the meta-prompt, knowledge base entries, project context (artifacts from prior steps), methodology settings, layered instructions, and depth-specific execution guidance.
31
31
 
32
- **Knowledge base** — 194 domain expertise entries in `content/knowledge/` organized in sixteen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
32
+ **Knowledge base** — 222 domain expertise entries in `content/knowledge/` organized in seventeen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
33
33
 
34
34
  **Methodology presets** — Three built-in presets control which steps run and how deep the analysis goes:
35
35
  - **deep** (depth 5) — all steps enabled, exhaustive analysis
@@ -368,7 +368,7 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
368
368
  | `--depth` | 1-5 | Custom methodology depth (requires `--methodology custom`) |
369
369
  | `--adapters` | comma-sep | AI adapters: claude-code, codex, gemini |
370
370
  | `--traits` | comma-sep | Project traits: web, mobile |
371
- | `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension |
371
+ | `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research |
372
372
  | `--auto` | boolean | Non-interactive mode (uses Zod defaults for unset flags) |
373
373
 
374
374
  #### Web-App Config Flags (require `--project-type web-app` or auto-set it)
@@ -445,6 +445,15 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
445
445
  | `--ext-content-script` | boolean | `--ext-content-script` / `--no-ext-content-script` |
446
446
  | `--ext-background-worker` | boolean | `--ext-background-worker` / `--no-ext-background-worker` |
447
447
 
448
+ #### Research Config Flags (require `--project-type research` or auto-set it)
449
+
450
+ | Flag | Type | Values |
451
+ |------|------|--------|
452
+ | `--research-driver` | string | code-driven, config-driven, api-driven, notebook-driven |
453
+ | `--research-interaction` | string | autonomous, checkpoint-gated, human-guided |
454
+ | `--research-domain` | string | none, quant-finance, ml-research, simulation |
455
+ | `--research-tracking` | boolean | `--research-tracking` / `--no-research-tracking` |
456
+
448
457
  #### Game Config Flags (require `--project-type game` or auto-set it)
449
458
 
450
459
  | Flag | Type | Values |
@@ -467,9 +476,9 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
467
476
 
468
477
  - **Flag > auto > interactive**: Flags always take highest precedence. `--auto --engine unreal` uses defaults for everything except engine.
469
478
  - **Partial flags + interactive**: Provide some flags and the wizard asks only the remaining questions. `scaffold init --project-type game --engine unreal` prompts interactively for multiplayer, platforms, etc.
470
- - **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`. Error if conflicting type.
471
- - **Cannot mix flag families**: `--web-rendering ssr --backend-api-style rest` is an error. Each flag family (`--web-*`, `--backend-*`, `--cli-*`, `--lib-*`, `--mobile-*`, `--pipeline-*`, `--ml-*`, `--ext-*`, game) is exclusive.
472
- - **Validation**: `--depth` requires `--methodology custom`. `--online-services` requires `--multiplayer online` or `hybrid`. SSR/hybrid rendering is incompatible with static deploy target. Session auth requires server state (not static). ML inference projects must specify a serving pattern. Browser extensions must declare at least one capability (UI surface, content script, or background worker).
479
+ - **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type.
480
+ - **Cannot mix flag families**: `--web-rendering ssr --backend-api-style rest` is an error. Each flag family (`--web-*`, `--backend-*`, `--cli-*`, `--lib-*`, `--mobile-*`, `--pipeline-*`, `--ml-*`, `--research-*`, `--ext-*`, game) is exclusive.
481
+ - **Validation**: `--depth` requires `--methodology custom`. `--online-services` requires `--multiplayer online` or `hybrid`. SSR/hybrid rendering is incompatible with static deploy target. Session auth requires server state (not static). ML inference projects must specify a serving pattern. Browser extensions must declare at least one capability (UI surface, content script, or background worker). Notebook-driven research cannot be fully autonomous.
473
482
 
474
483
  #### CI Examples
475
484
 
@@ -550,6 +559,16 @@ scaffold init --auto --methodology mvp --project-type browser-extension \
550
559
  --ext-manifest 3 --ext-ui-surfaces devtools \
551
560
  --no-ext-content-script
552
561
 
562
+ # Autonomous quant-finance research (trading strategy optimization)
563
+ scaffold init --auto --methodology deep --project-type research \
564
+ --research-driver code-driven --research-interaction autonomous \
565
+ --research-domain quant-finance
566
+
567
+ # Checkpoint-gated ML architecture search
568
+ scaffold init --auto --methodology deep --project-type research \
569
+ --research-driver config-driven --research-interaction checkpoint-gated \
570
+ --research-domain ml-research
571
+
553
572
  # Multiplayer mobile game with Unity
554
573
  scaffold init --project-type game --methodology deep --auto \
555
574
  --engine unity --multiplayer online --target-platforms ios,android \
@@ -576,7 +595,7 @@ Scaffold supports **project-type overlays** — domain-specific knowledge and pi
576
595
 
577
596
  - **Injects domain knowledge** into existing pipeline steps (e.g., SSR caching strategies into `tech-stack`, API pagination patterns into `coding-standards`)
578
597
 
579
- The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, and browser-extension overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other.
598
+ The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, and research overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other. The research type additionally supports **domain sub-overlays** (quant-finance, ml-research, simulation) that layer domain-specific knowledge on top of the core research overlay.
580
599
 
581
600
  Overlays are composable with methodology presets. An MVP web-app gets fewer steps at lower depth; a deep backend project gets exhaustive analysis of every architectural decision.
582
601
 
@@ -590,6 +609,7 @@ Overlays are composable with methodology presets. An MVP web-app gets fewer step
590
609
  | `data-pipeline` | `data-pipeline-overlay.yml` | 12 entries (architecture, batch and streaming patterns, orchestration, schema management, quality, testing, security) | Processing model, orchestration, data quality strategy, schema management, data catalog |
591
610
  | `ml` | `ml-overlay.yml` | 12 entries (architecture, training and serving patterns, experiment tracking, model evaluation, observability, testing, security) | Project phase, model type, serving pattern, experiment tracking |
592
611
  | `browser-extension` | `browser-extension-overlay.yml` | 12 entries (architecture, manifest configuration, service workers, content scripts, cross-browser, store submission, testing, security) | Manifest version, UI surfaces, content script, background worker |
612
+ | `research` | `research-overlay.yml` + domain sub-overlays | 25 entries (experiment loop, tracking, overfitting prevention, backtesting, risk metrics, architecture search, simulation) | Experiment driver, interaction mode, domain, experiment tracking |
593
613
  | `game` | `game-overlay.yml` | 24 entries (engines, networking, audio, VR/AR, economy, save systems, certification) | Engine, multiplayer, platforms, economy, narrative, and 6 more |
594
614
 
595
615
  ### Game Development
@@ -675,7 +695,7 @@ These answers control which conditional steps activate. A single-player puzzle g
675
695
 
676
696
  #### Multi-type Detection
677
697
 
678
- `scaffold adopt` detects 9 project types from manifest files and directory layouts:
698
+ `scaffold adopt` detects 10 project types from manifest files and directory layouts:
679
699
 
680
700
  | Type | Key Signals |
681
701
  |------|-------------|
@@ -688,6 +708,7 @@ These answers control which conditional steps activate. A single-player puzzle g
688
708
  | `data-pipeline` | `dags/` dir, Airflow/Prefect/Dagster deps, Spark configs |
689
709
  | `ml` | `training/`/`models/` dirs, PyTorch/TensorFlow deps, MLflow configs |
690
710
  | `browser-extension` | `manifest.json` with `manifest_version` field |
711
+ | `research` | `program.md` + `results.tsv`, backtest/strategy files with trading deps, optimization deps + experiment dirs, simulation framework deps |
691
712
 
692
713
  Each detector returns a confidence tier (high/medium/low) with evidence trails. Override detection with `--project-type <type>`.
693
714
 
@@ -926,23 +947,22 @@ You don't need both — Scaffold works with whichever CLIs are available. Having
926
947
  #### How mmr Works
927
948
 
928
949
  ```
929
- mmr review --pr 47 ──→ Dispatches to all channels in background
930
- Returns job ID immediately
931
- Agent continues working
932
-
933
- mmr status mmr-a1b2c3 ──→ Poll progress (which channels done?)
934
- Exit code: 0=done, 1=running, 4=failed
950
+ # Recommended: single-command pipeline (--sync)
951
+ mmr review --pr 47 --sync ──→ Dispatches to all channels
952
+ Runs compensating passes for unavailable channels
953
+ Parses outputs, reconciles findings
954
+ Applies severity gate, derives verdict
955
+ Exit code: 0=pass, 2=blocked, 3=needs-decision
935
956
 
936
- mmr results mmr-a1b2c3 ──→ Reconcile findings across channels
937
- Run compensating passes for unavailable channels
938
- Apply severity gate
939
- Output unified findings
940
- Exit code: 0=passed, 2=gate failed, 3=degraded
957
+ # Alternative: step-by-step (for async workflows)
958
+ mmr review --pr 47 ──→ Dispatch and await all channels
959
+ mmr results mmr-a1b2c3 ──→ Reconcile findings, output verdict
941
960
  ```
942
961
 
943
962
  **Key features:**
944
963
 
945
- - **Async job model** — reviews run in background processes. The agent fires `mmr review` and continues working. No blocking for 4-6 minutes.
964
+ - **--sync mode** — single-command pipeline: dispatch, parse, reconcile, verdict. The recommended entry point for agents and CI.
965
+ - **Compensating passes** — when a channel is unavailable, a Claude-based review focused on that channel's strength area runs automatically.
946
966
  - **Per-channel auth verification** — checks authentication before every dispatch. Auth failures are never silent — `mmr` tells you exactly what expired and the command to fix it.
947
967
  - **Immutable core prompt** — every channel gets the same severity definitions (P0-P3), output format spec (JSON), and review criteria. No prompt drift between channels.
948
968
  - **Automated reconciliation** — when two channels flag the same location, that's consensus (high confidence). When only one channel flags something, it's unique (medium confidence). P0 from any single source is always high confidence.
@@ -1058,6 +1078,14 @@ You can also adjust per-channel timeouts, the default severity threshold, and na
1058
1078
  **After creating a PR:**
1059
1079
 
1060
1080
  ```bash
1081
+ # Recommended: single-command review
1082
+ mmr review --pr 47 --sync --focus "auth flow, session handling"
1083
+ # → Full review output with verdict and findings
1084
+
1085
+ # Or with text output for readability:
1086
+ mmr review --pr 47 --sync --format text
1087
+
1088
+ # Step-by-step (when you want to continue working while review runs):
1061
1089
  mmr review --pr 47 --focus "auth flow, session handling"
1062
1090
  # → Job mmr-a1b2c3 started. 2/2 channels dispatched.
1063
1091
  ```
@@ -1316,7 +1344,7 @@ scaffold dashboard
1316
1344
 
1317
1345
  ## Knowledge System
1318
1346
 
1319
- Scaffold ships with 194 domain expertise entries organized in sixteen categories:
1347
+ Scaffold ships with 222 domain expertise entries organized in sixteen categories:
1320
1348
 
1321
1349
  - **core/** (26 entries) — eval craft, testing strategy, domain modeling, API design, database design, system architecture, ADR craft, security best practices, operations, task decomposition, user stories, UX specification, design system tokens, user story innovation, AI memory management, coding conventions, tech stack selection, project structure patterns, task tracking, CLAUDE.md patterns, multi-model review dispatch, review step template, dev environment, git workflow patterns, automated review tooling, vision craft
1322
1350
  - **product/** (5 entries) — PRD craft, PRD innovation, gap analysis, vision craft, vision innovation
@@ -1334,6 +1362,7 @@ Scaffold ships with 194 domain expertise entries organized in sixteen categories
1334
1362
  - **data-pipeline/** (12 entries) — batch/streaming/hybrid patterns, orchestration (DAG/event-driven/scheduled), data quality, schema management, lineage, pipeline testing
1335
1363
  - **ml/** (12 entries) — training and inference patterns, model types (classical/deep-learning/llm), serving patterns, experiment tracking, model evaluation, MLOps observability
1336
1364
  - **browser-extension/** (12 entries) — Manifest V3, content scripts, service workers, cross-browser compatibility, extension security, store submission
1365
+ - **research/** (25 entries) — experiment loop architecture, parameter optimization, overfitting prevention, experiment tracking, security/sandboxing; domain knowledge for quant-finance (backtesting, risk metrics, market data, strategy patterns), ML-research (architecture search, ablation studies, evaluation), and simulation (engine integration, parameter spaces, compute management)
1337
1366
 
1338
1367
  Each pipeline step declares which knowledge entries it needs in its frontmatter. The assembly engine injects them automatically. Knowledge files with a `## Deep Guidance` section are optimized for the CLI — only the deep guidance content is loaded into the assembled prompt, skipping the summary to avoid redundancy with the prompt text.
1339
1368
 
@@ -1540,7 +1569,7 @@ All build inputs live under `content/`:
1540
1569
  content/
1541
1570
  ├── pipeline/ # 60 meta-prompts organized by 16 phases (phases 0-15, including build)
1542
1571
  ├── tools/ # 10 tool meta-prompts (stateless, category: tool)
1543
- ├── knowledge/ # 194 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension)
1572
+ ├── knowledge/ # 222 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension)
1544
1573
  ├── methodology/ # 3 YAML presets (deep, mvp, custom)
1545
1574
  └── skills/ # Skill templates with {{markers}} for multi-platform resolution (includes mmr)
1546
1575
  ```
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: automated-review-tooling
3
- description: Patterns for setting up automated PR code review using AI models (Codex, Gemini) via local CLI, including dual-model review, reconciliation, and CI integration
3
+ description: Patterns for automated PR code review using AI CLI tools (Codex, Gemini, Claude) orchestration, reconciliation, compensating passes, and CI integration
4
4
  topics: [code-review, automation, codex, gemini, pull-requests, ci-cd, review-tooling]
5
5
  ---
6
6
 
@@ -24,10 +24,10 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
24
24
 
25
25
  | Verdict | Condition |
26
26
  |---------|-----------|
27
- | `pass` | All configured channels ran, no unresolved P0/P1/P2 |
28
- | `degraded-pass` | Channels skipped, compensated, or have non-full coverage (e.g., partial timeout), no unresolved P0/P1/P2 |
29
- | `blocked` | Unresolved P0/P1/P2 after 3 fix rounds |
30
- | `needs-user-decision` | Contradictions or unresolvable findings |
27
+ | `pass` | All channels completed, no unresolved P0/P1/P2 |
28
+ | `degraded-pass` | Some channels unavailable, compensating passes ran, no unresolved P0/P1/P2 |
29
+ | `blocked` | Findings at or above fix threshold remain unresolved |
30
+ | `needs-user-decision` | No channels completed insufficient data for a determination |
31
31
 
32
32
  **Verdict precedence:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply, the higher-precedence verdict wins.
33
33
 
@@ -35,13 +35,13 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
35
35
 
36
36
  #### Status Model
37
37
 
38
- `compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `auth_timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `auth_timeout`). The report uses the **coverage label** to show the reader what ran.
38
+ `compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `timeout`). The report uses the **coverage label** to show the reader what ran.
39
39
 
40
40
  #### Compensating Passes
41
41
 
42
- When an external channel (Codex or Gemini) is unavailable, run a compensating Claude self-review pass:
42
+ When a channel (Codex or Gemini) is unavailable, the CLI dispatches a compensating pass via `claude -p`:
43
43
 
44
- - Same prompt structure as the missing channel, executed as a Claude self-review pass.
44
+ - Same prompt structure as the missing channel, executed as a `claude -p` dispatch.
45
45
  - Labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]` in the review summary.
46
46
  - Missing Codex → focus on implementation correctness, security, API contracts.
47
47
  - Missing Gemini → focus on architectural patterns, design reasoning, broad context.
@@ -49,8 +49,6 @@ When an external channel (Codex or Gemini) is unavailable, run a compensating Cl
49
49
  - Compensating-pass findings are **single-source confidence** — they do NOT raise to high confidence even if they agree with another channel's findings.
50
50
  - Normal mandatory-fix thresholds apply: P0/P1/P2 findings from compensating passes still require fixing.
51
51
 
52
- **Superpowers channel:** No compensating pass needed — Superpowers is a Claude subagent and is always available. If the Superpowers plugin is not installed, run available external CLIs and warn the user that review coverage is reduced.
53
-
54
52
  #### Foreground-Only Execution
55
53
 
56
54
  Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
@@ -67,7 +65,7 @@ Reconciliation normalizes findings from all channels (real and compensating) to
67
65
 
68
66
  The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action at P0/P1/P2 but should be noted as lower-confidence in the review summary.
69
67
 
70
- Findings that appear in all three channels (Codex, Gemini, Superpowers) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
68
+ Findings that appear in all three channels (Codex, Gemini, Claude) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
71
69
 
72
70
  ```bash
73
71
  # Orchestration reconciliation workflow
@@ -80,16 +78,15 @@ Findings that appear in all three channels (Codex, Gemini, Superpowers) are cons
80
78
 
81
79
  ### Channel Dispatch Pattern and Orchestration
82
80
 
83
- Each external channel (Codex, Gemini) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass, and continue to the next channel. The Superpowers channel is always available as a Claude subagent and does not require installation or auth checks.
81
+ Each channel (Codex, Gemini, Claude) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass (for Codex/Gemini), and continue to the next channel.
84
82
 
85
83
  ```bash
86
84
  # Channel dispatch pattern
87
- # For each external channel (Codex, Gemini):
85
+ # For each channel (codex, gemini, claude):
88
86
  # 1. command -v <tool> >/dev/null 2>&1 || { status=not_installed; queue_compensating; continue; }
89
87
  # 2. <auth_check> || { status=auth_failed; queue_compensating; continue; }
90
88
  # 3. <dispatch_foreground> || { status=failed; queue_compensating; continue; }
91
- # For Superpowers: dispatch subagent (always available)
92
- # After all: run queued compensating passes → reconcile → verdict
89
+ # After all: run queued compensating passes (via claude -p) → reconcile → verdict
93
90
  ```
94
91
 
95
92
  After all channels and compensating passes complete, run the reconciliation workflow above and apply the verdict decision flow. Channel results and compensating-pass labels must be preserved in the review output for auditability — do not collapse or omit them even when findings are empty.
@@ -99,14 +96,14 @@ After all channels and compensating passes complete, run the reconciliation work
99
96
  When Codex is unavailable (not installed or auth failure), the orchestration proceeds as follows:
100
97
 
101
98
  1. The installation check (`command -v codex`) fails. Codex channel status is set to `not_installed`.
102
- 2. A compensating Codex-equivalent pass is queued: a Claude self-review focused on implementation correctness, security, and API contracts.
103
- 3. Gemini and Superpowers channels run normally.
99
+ 2. A compensating Codex-equivalent pass is queued: a `claude -p` dispatch focused on implementation correctness, security, and API contracts.
100
+ 3. Gemini and Claude channels run normally.
104
101
  4. The compensating pass runs, producing findings labeled `[compensating: Codex-equivalent]`.
105
- 5. Reconciliation merges findings from all three sources (Gemini, Superpowers, compensating-Codex).
102
+ 5. Reconciliation merges findings from all three sources (Gemini, Claude, compensating-Codex).
106
103
  6. Maximum achievable verdict is `degraded-pass` because a real channel was absent.
107
104
  7. The review summary notes: "Codex channel: not_installed (compensating: Codex-equivalent pass ran)."
108
105
 
109
- **Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `auth_timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
106
+ **Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
110
107
 
111
108
  ### Verdict Decision Flow
112
109
 
@@ -114,19 +111,17 @@ Apply the following evaluation order to determine the final verdict. The first m
114
111
 
115
112
  ```
116
113
  Verdict evaluation order:
117
- 1. Any contradictions or unresolvable findings? → needs-user-decision
114
+ 1. No channels completed? → needs-user-decision
118
115
  2. Any unresolved P0/P1/P2 after 3 fix rounds? → blocked
119
116
  3. Any channel not at full coverage? → degraded-pass
120
117
  4. All channels completed, no unresolved P0/P1/P2? → pass
121
118
  ```
122
119
 
123
- A "contradiction" exists when two channels report opposite conclusions about the same code location for example, Codex flags a function as insecure while Gemini explicitly approves it. Contradictions cannot be resolved by the agent alone and must be surfaced to the user.
124
-
125
- A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, it timed out partially, or the Superpowers plugin is not installed and available channels do not cover the full diff.
120
+ A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, or it timed out.
126
121
 
127
- **Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. If multiple conditions apply simultaneously (for example, both a contradiction and an unresolved P0 exist), the higher-precedence verdict wins.
122
+ **Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply simultaneously, the higher-precedence verdict wins.
128
123
 
129
- The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings and no contradictions remain, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
124
+ The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
130
125
 
131
126
  ### Security-Focused Review Checklist
132
127
 
@@ -197,4 +192,4 @@ When external CLIs are unavailable, the degraded-mode behavior defined in the Su
197
192
  5. When both external channels are unavailable, note "All findings are single-model (Claude only). External validation was unavailable." in the review summary.
198
193
  6. Never silently drop unavailable channels — always record the channel status and compensating coverage label in the review output.
199
194
 
200
- **Superpowers channel exception:** Superpowers is a Claude subagent and requires no external CLI or auth. It is always available as long as the Superpowers plugin is installed in the Claude Code environment. If the plugin is not installed, run available external CLIs and warn the user that review coverage is reduced but do not run a compensating pass for Superpowers (the compensating-pass mechanism only applies to external CLIs that have an installation/auth gate).
195
+ **Claude CLI channel:** Claude CLI handles its own auth and is generally always available. The compensating-pass mechanism applies to external CLIs (Codex, Gemini) that have an installation/auth gate. When Codex or Gemini are unavailable, compensating passes are dispatched via `claude -p` with focused prompts targeting the missing channel's strength area.
@@ -1,27 +1,18 @@
1
1
  ---
2
2
  name: multi-model-review-dispatch
3
- description: Patterns for dispatching reviews to external AI models (Codex, Gemini) at depth 4+, including fallback strategies and finding reconciliation
4
- topics: [multi-model, code-review, depth-scaling, codex, gemini, review-synthesis]
3
+ description: Patterns for dispatching reviews to AI CLI tools (Codex, Gemini, Claude), including fallback strategies and finding reconciliation
4
+ topics: [multi-model, code-review, codex, gemini, claude, review-synthesis]
5
5
  ---
6
6
 
7
7
  # Multi-Model Review Dispatch
8
8
 
9
- At higher methodology depths (4+), reviews benefit from independent validation by external AI models. Different models have different blind spots — Codex excels at code-centric analysis while Gemini brings strength in design and architectural reasoning. Dispatching to multiple models and reconciling their findings produces higher-quality reviews than any single model alone. This knowledge covers when to dispatch, how to dispatch, how to handle failures, and how to reconcile disagreements.
9
+ Reviews benefit from independent validation by multiple AI models. Different models have different blind spots — Codex excels at code-centric analysis, Gemini brings strength in design and architectural reasoning, and Claude provides plan alignment and code quality assessment. Dispatching to multiple models and reconciling their findings produces higher-quality reviews than any single model alone. This knowledge covers how to dispatch, how to handle failures, and how to reconcile disagreements.
10
10
 
11
11
  ## Summary
12
12
 
13
13
  ### When to Dispatch
14
14
 
15
- Multi-model review activates at depth 4+ in the methodology scaling system:
16
-
17
- | Depth | Review Approach |
18
- |-------|----------------|
19
- | 1-2 | Claude-only, reduced pass count |
20
- | 3 | Claude-only, full pass count |
21
- | 4 | Full passes + one external model (if available) |
22
- | 5 | Full passes + multi-model with reconciliation |
23
-
24
- Dispatch is always optional. If no external model CLI is available, the review proceeds as a Claude-only enhanced review with additional self-review passes to partially compensate.
15
+ Multi-model review runs all enabled channels on every review. The MMR CLI (`mmr review --sync`) is the primary entry point and handles dispatch, parsing, reconciliation, and verdict derivation automatically.
25
16
 
26
17
  ### Model Selection
27
18
 
@@ -29,15 +20,16 @@ Dispatch is always optional. If no external model CLI is available, the review p
29
20
  |-------|----------|----------|
30
21
  | **Codex** (OpenAI) | Code analysis, implementation correctness, API contract validation | Code reviews, security reviews, API reviews, database schema reviews |
31
22
  | **Gemini** (Google) | Design reasoning, architectural patterns, broad context understanding | Architecture reviews, PRD reviews, UX reviews, domain model reviews |
23
+ | **Claude** (Anthropic) | Plan alignment, code quality, testing thoroughness | Code reviews, plan verification, test coverage |
32
24
 
33
- When both models are available at depth 5, dispatch to both and reconcile. At depth 4, choose the model best suited to the artifact type.
25
+ All enabled channels run on every review. When a channel is unavailable, a compensating pass is dispatched via `claude -p` focused on the missing channel's strength area.
34
26
 
35
27
  ### Graceful Fallback
36
28
 
37
29
  External models are never required. The fallback chain:
38
30
  1. Attempt dispatch to selected model(s)
39
31
  2. If CLI unavailable → skip that model, note in report
40
- 3. If timeout → use partial results if any, note incompleteness
32
+ 3. If timeout → CLI kills the process; no partial output preserved; compensating pass runs
41
33
  4. If all external models fail → Claude-only enhanced review (additional self-review passes)
42
34
 
43
35
  The review never blocks on external model availability.
@@ -82,15 +74,15 @@ If auth fails, report status `auth_failed` and surface recovery to the user:
82
74
  - Codex: "Codex auth expired — run `! codex login` to re-authenticate"
83
75
  - Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
84
76
 
85
- If auth check times out (~5 seconds), retry once. If still failing, report `auth_timeout`.
77
+ If auth check times out (~5 seconds), retry once. If still failing, report `timeout`.
86
78
  If auth succeeds, report `ready` and proceed to dispatch.
87
79
 
88
80
  **Post-dispatch terminal states:**
89
81
  - `completed` — channel produced results, use normally
90
- - `partial_timeout` — partial output before timeout; use what was received, note incompleteness. Does NOT trigger compensating pass.
91
- - `failed` — crashed or unparseable output; triggers compensating pass.
82
+ - `timeout` — channel exceeded time limit; CLI kills the process and marks it as `timeout`; triggers compensating pass
83
+ - `failed` — crashed or unparseable output; triggers compensating pass
92
84
 
93
- Verdict impact: `partial_timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
85
+ Verdict impact: `timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
94
86
 
95
87
  #### Prompt Formatting
96
88
 
@@ -126,10 +118,12 @@ Respond with a JSON array of findings:
126
118
  "severity": "P0|P1|P2|P3",
127
119
  "category": "coverage|consistency|correctness|completeness",
128
120
  "location": "section or line reference",
129
- "finding": "description of the issue",
121
+ "description": "description of the issue",
130
122
  "suggestion": "recommended fix"
131
123
  }
132
124
  ]
125
+
126
+ Note: `id` and `category` are optional — the CLI auto-generates IDs (F-001, F-002, ...) when omitted.
133
127
  ```
134
128
 
135
129
  #### Output Parsing
@@ -137,15 +131,11 @@ Respond with a JSON array of findings:
137
131
  External model output is parsed as JSON. Handle common parsing issues:
138
132
  - Strip markdown code fences (```json ... ```) if the model wraps output
139
133
  - Handle trailing commas in JSON arrays
140
- - Validate that each finding has the required fields (severity, category, finding)
134
+ - Validate that each finding has the required fields (severity, location, description, suggestion)
141
135
  - Discard malformed entries rather than failing the entire parse
142
136
 
143
- Store raw output for audit:
144
- ```
145
- docs/reviews/{artifact}/codex-review.json — raw Codex findings
146
- docs/reviews/{artifact}/gemini-review.json — raw Gemini findings
147
- docs/reviews/{artifact}/review-summary.md — reconciled synthesis
148
- ```
137
+ The CLI stores raw output at `~/.mmr/jobs/{job-id}/` per channel. Review results
138
+ are available via `mmr results <job-id>`.
149
139
 
150
140
  ### Timeout Handling
151
141
 
@@ -158,14 +148,7 @@ External model calls can hang or take unreasonably long. Set reasonable timeouts
158
148
  | Medium artifact review (2000-10000 words) | 120 seconds | Needs more processing time |
159
149
  | Large artifact review (>10000 words) | 180 seconds | Maximum reasonable wait |
160
150
 
161
- #### Partial Result Handling
162
-
163
- If a timeout occurs mid-response:
164
- 1. Check if the partial output contains valid JSON entries
165
- 2. If yes, use the valid entries and note "partial results" in the report
166
- 3. If no, treat as a model failure and fall back
167
-
168
- Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model.
151
+ Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model. When a channel times out, the CLI kills the process — no partial output is preserved. A compensating pass runs in its place.
169
152
 
170
153
  ### Finding Reconciliation
171
154
 
@@ -224,9 +207,9 @@ When synthesizing multi-model findings, classify each finding:
224
207
  # Multi-Model Review Summary: [Artifact Name]
225
208
 
226
209
  ## Models Used
227
- - Claude (primary reviewer)
228
- - Codex (external, depth 4+) — [available/unavailable/timeout]
229
- - Gemini (external, depth 5) — [available/unavailable/timeout]
210
+ - Claude CLI — [available/unavailable/timeout]
211
+ - Codex CLI — [available/unavailable/timeout]
212
+ - Gemini CLI — [available/unavailable/timeout]
230
213
 
231
214
  ## Consensus Findings
232
215
  | # | Severity | Finding | Models | Confidence |
@@ -251,14 +234,10 @@ or areas where external models provided unique value]
251
234
 
252
235
  #### Raw JSON Preservation
253
236
 
254
- Always preserve the raw JSON output from external models, even after reconciliation. The raw findings serve as an audit trail and enable re-analysis if the reconciliation logic is later improved.
237
+ Always preserve the raw JSON output from each channel, even after reconciliation. The raw findings serve as an audit trail and enable re-analysis if the reconciliation logic is later improved.
255
238
 
256
- ```
257
- docs/reviews/{artifact}/
258
- codex-review.json — raw output from Codex
259
- gemini-review.json — raw output from Gemini
260
- review-summary.md — reconciled synthesis
261
- ```
239
+ The CLI stores raw output at `~/.mmr/jobs/{job-id}/` with per-channel result files.
240
+ Results are accessible via `mmr results <job-id>`.
262
241
 
263
242
  ### Quality Gates
264
243
 
@@ -266,20 +245,18 @@ Minimum standards for a multi-model review to be considered complete:
266
245
 
267
246
  | Gate | Threshold | Rationale |
268
247
  |------|-----------|-----------|
269
- | Minimum finding count | At least 3 findings across all models | A review with zero findings likely missed something |
270
- | Coverage threshold | Every review pass has at least one finding or explicit "no issues found" note | Ensures all passes were actually executed |
248
+ | Coverage threshold | Every channel has at least one finding or explicit "no issues found" note | Ensures all channels were actually executed |
271
249
  | Reconciliation completeness | All cross-model disagreements have documented resolutions | No unresolved conflicts |
272
- | Raw output preserved | JSON files exist for all models that were dispatched | Audit trail |
250
+ | Raw output preserved | Per-channel results exist for all dispatched channels | Audit trail |
273
251
 
274
- If the primary Claude review produces zero findings and external models are unavailable, the review should explicitly note this as unusual and recommend a targeted re-review at a later stage.
252
+ Zero findings across all channels is a valid outcome when the diff is clean.
275
253
 
276
254
  #### Degraded-Mode Gate Adaptation
277
255
 
278
256
  When channels are skipped and compensating passes are used:
279
257
 
280
- - **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
281
258
  - **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
282
- - **Coverage threshold** gate: compensating passes satisfy the "every pass has at least one finding or explicit no-issues note" requirement.
259
+ - **Coverage threshold** gate: compensating passes satisfy the "every channel has at least one finding or explicit no-issues note" requirement.
283
260
  - The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
284
261
 
285
262
  ### Common Anti-Patterns
@@ -288,12 +265,10 @@ When channels are skipped and compensating passes are used:
288
265
 
289
266
  **Ignoring disagreements.** Two models disagree, and the reviewer picks one without analysis. Fix: disagreements are the most valuable signal in multi-model review. They identify areas of genuine ambiguity or complexity. Always investigate and document the resolution.
290
267
 
291
- **Dispatching at low depth.** Running external model reviews at depth 1-2 where the review scope is intentionally minimal. The external model does a full analysis anyway, producing findings that are out of scope. Fix: only dispatch at depth 4+. Lower depths use Claude-only review with reduced pass count.
292
-
293
- **No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The fallback to Claude-only enhanced review must be implemented and tested.
268
+ **No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The CLI automatically dispatches compensating passes via `claude -p` when channels are unavailable.
294
269
 
295
270
  **Over-weighting consensus.** Two models agree on a finding, so it must be correct. But both models may share the same bias (e.g., both flag a pattern as an anti-pattern that is actually appropriate for this project's constraints). Fix: consensus increases confidence but does not guarantee correctness. All findings still require artifact-level verification.
296
271
 
297
272
  **Dispatching the full pipeline context.** Sending the entire project context (all docs, all code) to the external model. This exceeds context limits and dilutes focus. Fix: send only the artifact under review and the minimal upstream context needed for that specific review.
298
273
 
299
- **Ignoring partial results.** A model times out after producing 3 of 5 findings. The reviewer discards all results because the review is "incomplete." Fix: partial results are still valuable. Include them with a note about incompleteness. Three real findings are better than zero.
274
+ **Treating a timeout as a silent skip.** A channel times out and the reviewer proceeds without documenting it. Fix: when a channel times out, record the root-cause status as `timeout`, queue a compensating pass, and include it in the review summary. The CLI kills timed-out processes no partial output is available, but the compensating pass ensures coverage.