@harness-engineering/cli 1.7.0 → 1.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (187) hide show
  1. package/dist/agents/personas/documentation-maintainer.yaml +3 -1
  2. package/dist/agents/personas/performance-guardian.yaml +23 -0
  3. package/dist/agents/skills/claude-code/align-documentation/SKILL.md +13 -0
  4. package/dist/agents/skills/claude-code/cleanup-dead-code/SKILL.md +25 -1
  5. package/dist/agents/skills/claude-code/cleanup-dead-code/skill.yaml +5 -2
  6. package/dist/agents/skills/claude-code/detect-doc-drift/SKILL.md +12 -0
  7. package/dist/agents/skills/claude-code/enforce-architecture/SKILL.md +48 -1
  8. package/dist/agents/skills/claude-code/enforce-architecture/skill.yaml +5 -2
  9. package/dist/agents/skills/claude-code/harness-accessibility/SKILL.md +7 -0
  10. package/dist/agents/skills/claude-code/harness-autopilot/SKILL.md +11 -3
  11. package/dist/agents/skills/claude-code/harness-brainstorming/SKILL.md +81 -11
  12. package/dist/agents/skills/claude-code/harness-brainstorming/skill.yaml +2 -0
  13. package/dist/agents/skills/claude-code/harness-code-review/SKILL.md +487 -234
  14. package/dist/agents/skills/claude-code/harness-code-review/skill.yaml +15 -2
  15. package/dist/agents/skills/claude-code/harness-codebase-cleanup/SKILL.md +226 -0
  16. package/dist/agents/skills/claude-code/harness-codebase-cleanup/skill.yaml +64 -0
  17. package/dist/agents/skills/claude-code/harness-dependency-health/SKILL.md +35 -6
  18. package/dist/agents/skills/claude-code/harness-docs-pipeline/SKILL.md +460 -0
  19. package/dist/agents/skills/claude-code/harness-docs-pipeline/skill.yaml +69 -0
  20. package/dist/agents/skills/claude-code/harness-execution/SKILL.md +73 -8
  21. package/dist/agents/skills/claude-code/harness-execution/skill.yaml +1 -0
  22. package/dist/agents/skills/claude-code/harness-hotspot-detector/SKILL.md +32 -6
  23. package/dist/agents/skills/claude-code/harness-i18n/SKILL.md +484 -0
  24. package/dist/agents/skills/claude-code/harness-i18n/skill.yaml +54 -0
  25. package/dist/agents/skills/claude-code/harness-i18n-process/SKILL.md +388 -0
  26. package/dist/agents/skills/claude-code/harness-i18n-process/skill.yaml +43 -0
  27. package/dist/agents/skills/claude-code/harness-i18n-workflow/SKILL.md +512 -0
  28. package/dist/agents/skills/claude-code/harness-i18n-workflow/skill.yaml +53 -0
  29. package/dist/agents/skills/claude-code/harness-impact-analysis/SKILL.md +35 -6
  30. package/dist/agents/skills/claude-code/harness-integrity/SKILL.md +17 -1
  31. package/dist/agents/skills/claude-code/harness-knowledge-mapper/SKILL.md +46 -5
  32. package/dist/agents/skills/claude-code/harness-perf/SKILL.md +37 -8
  33. package/dist/agents/skills/claude-code/harness-perf/skill.yaml +3 -0
  34. package/dist/agents/skills/claude-code/harness-perf-tdd/SKILL.md +17 -4
  35. package/dist/agents/skills/claude-code/harness-planning/SKILL.md +59 -5
  36. package/dist/agents/skills/claude-code/harness-planning/skill.yaml +2 -0
  37. package/dist/agents/skills/claude-code/harness-release-readiness/SKILL.md +16 -0
  38. package/dist/agents/skills/claude-code/harness-roadmap/SKILL.md +561 -0
  39. package/dist/agents/skills/claude-code/harness-roadmap/skill.yaml +43 -0
  40. package/dist/agents/skills/claude-code/harness-security-review/SKILL.md +36 -2
  41. package/dist/agents/skills/claude-code/harness-security-review/skill.yaml +8 -6
  42. package/dist/agents/skills/claude-code/harness-soundness-review/SKILL.md +1267 -0
  43. package/dist/agents/skills/claude-code/harness-soundness-review/skill.yaml +48 -0
  44. package/dist/agents/skills/claude-code/harness-test-advisor/SKILL.md +35 -6
  45. package/dist/agents/skills/claude-code/harness-verification/SKILL.md +66 -0
  46. package/dist/agents/skills/claude-code/harness-verification/skill.yaml +1 -0
  47. package/dist/agents/skills/claude-code/harness-verify/SKILL.md +11 -0
  48. package/dist/agents/skills/claude-code/initialize-harness-project/SKILL.md +15 -1
  49. package/dist/agents/skills/claude-code/validate-context-engineering/SKILL.md +12 -0
  50. package/dist/agents/skills/gemini-cli/add-harness-component/SKILL.md +192 -0
  51. package/dist/agents/skills/gemini-cli/add-harness-component/skill.yaml +32 -0
  52. package/dist/agents/skills/gemini-cli/align-documentation/SKILL.md +213 -0
  53. package/dist/agents/skills/gemini-cli/align-documentation/skill.yaml +31 -0
  54. package/dist/agents/skills/gemini-cli/check-mechanical-constraints/SKILL.md +191 -0
  55. package/dist/agents/skills/gemini-cli/check-mechanical-constraints/skill.yaml +32 -0
  56. package/dist/agents/skills/gemini-cli/cleanup-dead-code/SKILL.md +245 -0
  57. package/dist/agents/skills/gemini-cli/cleanup-dead-code/skill.yaml +33 -0
  58. package/dist/agents/skills/gemini-cli/detect-doc-drift/SKILL.md +179 -0
  59. package/dist/agents/skills/gemini-cli/detect-doc-drift/skill.yaml +30 -0
  60. package/dist/agents/skills/gemini-cli/enforce-architecture/SKILL.md +240 -0
  61. package/dist/agents/skills/gemini-cli/enforce-architecture/skill.yaml +34 -0
  62. package/dist/agents/skills/gemini-cli/harness-accessibility/SKILL.md +7 -0
  63. package/dist/agents/skills/gemini-cli/harness-architecture-advisor/SKILL.md +397 -0
  64. package/dist/agents/skills/gemini-cli/harness-architecture-advisor/skill.yaml +48 -0
  65. package/dist/agents/skills/gemini-cli/harness-autopilot/SKILL.md +11 -3
  66. package/dist/agents/skills/gemini-cli/harness-brainstorming/SKILL.md +317 -0
  67. package/dist/agents/skills/gemini-cli/harness-brainstorming/skill.yaml +49 -0
  68. package/dist/agents/skills/gemini-cli/harness-code-review/SKILL.md +681 -0
  69. package/dist/agents/skills/gemini-cli/harness-code-review/skill.yaml +45 -0
  70. package/dist/agents/skills/gemini-cli/harness-codebase-cleanup/SKILL.md +226 -0
  71. package/dist/agents/skills/gemini-cli/harness-codebase-cleanup/skill.yaml +64 -0
  72. package/dist/agents/skills/gemini-cli/harness-debugging/SKILL.md +366 -0
  73. package/dist/agents/skills/gemini-cli/harness-debugging/skill.yaml +47 -0
  74. package/dist/agents/skills/gemini-cli/harness-dependency-health/SKILL.md +35 -6
  75. package/dist/agents/skills/gemini-cli/harness-diagnostics/SKILL.md +318 -0
  76. package/dist/agents/skills/gemini-cli/harness-diagnostics/skill.yaml +50 -0
  77. package/dist/agents/skills/gemini-cli/harness-docs-pipeline/SKILL.md +460 -0
  78. package/dist/agents/skills/gemini-cli/harness-docs-pipeline/skill.yaml +69 -0
  79. package/dist/agents/skills/gemini-cli/harness-execution/SKILL.md +382 -0
  80. package/dist/agents/skills/gemini-cli/harness-execution/skill.yaml +51 -0
  81. package/dist/agents/skills/gemini-cli/harness-git-workflow/SKILL.md +268 -0
  82. package/dist/agents/skills/gemini-cli/harness-git-workflow/skill.yaml +31 -0
  83. package/dist/agents/skills/gemini-cli/harness-hotspot-detector/SKILL.md +32 -6
  84. package/dist/agents/skills/gemini-cli/harness-i18n/SKILL.md +484 -0
  85. package/dist/agents/skills/gemini-cli/harness-i18n/skill.yaml +54 -0
  86. package/dist/agents/skills/gemini-cli/harness-i18n-process/SKILL.md +388 -0
  87. package/dist/agents/skills/gemini-cli/harness-i18n-process/skill.yaml +43 -0
  88. package/dist/agents/skills/gemini-cli/harness-i18n-workflow/SKILL.md +512 -0
  89. package/dist/agents/skills/gemini-cli/harness-i18n-workflow/skill.yaml +53 -0
  90. package/dist/agents/skills/gemini-cli/harness-impact-analysis/SKILL.md +35 -6
  91. package/dist/agents/skills/gemini-cli/harness-integrity/SKILL.md +167 -0
  92. package/dist/agents/skills/gemini-cli/harness-integrity/skill.yaml +47 -0
  93. package/dist/agents/skills/gemini-cli/harness-knowledge-mapper/SKILL.md +46 -5
  94. package/dist/agents/skills/gemini-cli/harness-onboarding/SKILL.md +288 -0
  95. package/dist/agents/skills/gemini-cli/harness-onboarding/skill.yaml +30 -0
  96. package/dist/agents/skills/gemini-cli/harness-parallel-agents/SKILL.md +171 -0
  97. package/dist/agents/skills/gemini-cli/harness-parallel-agents/skill.yaml +33 -0
  98. package/dist/agents/skills/gemini-cli/harness-perf/SKILL.md +37 -8
  99. package/dist/agents/skills/gemini-cli/harness-perf/skill.yaml +3 -0
  100. package/dist/agents/skills/gemini-cli/harness-perf-tdd/SKILL.md +17 -4
  101. package/dist/agents/skills/gemini-cli/harness-planning/SKILL.md +389 -0
  102. package/dist/agents/skills/gemini-cli/harness-planning/skill.yaml +49 -0
  103. package/dist/agents/skills/gemini-cli/harness-pre-commit-review/SKILL.md +262 -0
  104. package/dist/agents/skills/gemini-cli/harness-pre-commit-review/skill.yaml +33 -0
  105. package/dist/agents/skills/gemini-cli/harness-refactoring/SKILL.md +169 -0
  106. package/dist/agents/skills/gemini-cli/harness-refactoring/skill.yaml +33 -0
  107. package/dist/agents/skills/gemini-cli/harness-release-readiness/SKILL.md +16 -0
  108. package/dist/agents/skills/gemini-cli/harness-roadmap/SKILL.md +561 -0
  109. package/dist/agents/skills/gemini-cli/harness-roadmap/skill.yaml +43 -0
  110. package/dist/agents/skills/gemini-cli/harness-security-review/skill.yaml +8 -6
  111. package/dist/agents/skills/gemini-cli/harness-skill-authoring/SKILL.md +292 -0
  112. package/dist/agents/skills/gemini-cli/harness-skill-authoring/skill.yaml +32 -0
  113. package/dist/agents/skills/gemini-cli/harness-soundness-review/SKILL.md +1267 -0
  114. package/dist/agents/skills/gemini-cli/harness-soundness-review/skill.yaml +48 -0
  115. package/dist/agents/skills/gemini-cli/harness-state-management/SKILL.md +309 -0
  116. package/dist/agents/skills/gemini-cli/harness-state-management/skill.yaml +32 -0
  117. package/dist/agents/skills/gemini-cli/harness-tdd/SKILL.md +177 -0
  118. package/dist/agents/skills/gemini-cli/harness-tdd/skill.yaml +48 -0
  119. package/dist/agents/skills/gemini-cli/harness-test-advisor/SKILL.md +35 -6
  120. package/dist/agents/skills/gemini-cli/harness-verification/SKILL.md +328 -0
  121. package/dist/agents/skills/gemini-cli/harness-verification/skill.yaml +42 -0
  122. package/dist/agents/skills/gemini-cli/harness-verify/SKILL.md +159 -0
  123. package/dist/agents/skills/gemini-cli/harness-verify/skill.yaml +40 -0
  124. package/dist/agents/skills/gemini-cli/initialize-harness-project/SKILL.md +224 -0
  125. package/dist/agents/skills/gemini-cli/initialize-harness-project/skill.yaml +31 -0
  126. package/dist/agents/skills/gemini-cli/validate-context-engineering/SKILL.md +150 -0
  127. package/dist/agents/skills/gemini-cli/validate-context-engineering/skill.yaml +31 -0
  128. package/dist/agents/skills/shared/i18n-knowledge/accessibility/intersection.yaml +142 -0
  129. package/dist/agents/skills/shared/i18n-knowledge/anti-patterns/encoding.yaml +67 -0
  130. package/dist/agents/skills/shared/i18n-knowledge/anti-patterns/formatting.yaml +106 -0
  131. package/dist/agents/skills/shared/i18n-knowledge/anti-patterns/layout.yaml +80 -0
  132. package/dist/agents/skills/shared/i18n-knowledge/anti-patterns/pluralization.yaml +80 -0
  133. package/dist/agents/skills/shared/i18n-knowledge/anti-patterns/string-handling.yaml +106 -0
  134. package/dist/agents/skills/shared/i18n-knowledge/frameworks/android-resources.yaml +47 -0
  135. package/dist/agents/skills/shared/i18n-knowledge/frameworks/apple-strings.yaml +47 -0
  136. package/dist/agents/skills/shared/i18n-knowledge/frameworks/backend-patterns.yaml +50 -0
  137. package/dist/agents/skills/shared/i18n-knowledge/frameworks/flutter-intl.yaml +47 -0
  138. package/dist/agents/skills/shared/i18n-knowledge/frameworks/i18next.yaml +47 -0
  139. package/dist/agents/skills/shared/i18n-knowledge/frameworks/react-intl.yaml +47 -0
  140. package/dist/agents/skills/shared/i18n-knowledge/frameworks/vue-i18n.yaml +47 -0
  141. package/dist/agents/skills/shared/i18n-knowledge/industries/ecommerce.yaml +66 -0
  142. package/dist/agents/skills/shared/i18n-knowledge/industries/fintech.yaml +66 -0
  143. package/dist/agents/skills/shared/i18n-knowledge/industries/gaming.yaml +69 -0
  144. package/dist/agents/skills/shared/i18n-knowledge/industries/healthcare.yaml +66 -0
  145. package/dist/agents/skills/shared/i18n-knowledge/industries/legal.yaml +66 -0
  146. package/dist/agents/skills/shared/i18n-knowledge/locales/ar.yaml +41 -0
  147. package/dist/agents/skills/shared/i18n-knowledge/locales/de.yaml +35 -0
  148. package/dist/agents/skills/shared/i18n-knowledge/locales/en.yaml +32 -0
  149. package/dist/agents/skills/shared/i18n-knowledge/locales/es.yaml +35 -0
  150. package/dist/agents/skills/shared/i18n-knowledge/locales/fi.yaml +35 -0
  151. package/dist/agents/skills/shared/i18n-knowledge/locales/fr.yaml +35 -0
  152. package/dist/agents/skills/shared/i18n-knowledge/locales/he.yaml +41 -0
  153. package/dist/agents/skills/shared/i18n-knowledge/locales/hi.yaml +35 -0
  154. package/dist/agents/skills/shared/i18n-knowledge/locales/it.yaml +32 -0
  155. package/dist/agents/skills/shared/i18n-knowledge/locales/ja.yaml +38 -0
  156. package/dist/agents/skills/shared/i18n-knowledge/locales/ko.yaml +38 -0
  157. package/dist/agents/skills/shared/i18n-knowledge/locales/nl.yaml +32 -0
  158. package/dist/agents/skills/shared/i18n-knowledge/locales/pl.yaml +35 -0
  159. package/dist/agents/skills/shared/i18n-knowledge/locales/pt.yaml +32 -0
  160. package/dist/agents/skills/shared/i18n-knowledge/locales/ru.yaml +35 -0
  161. package/dist/agents/skills/shared/i18n-knowledge/locales/sv.yaml +32 -0
  162. package/dist/agents/skills/shared/i18n-knowledge/locales/th.yaml +35 -0
  163. package/dist/agents/skills/shared/i18n-knowledge/locales/tr.yaml +35 -0
  164. package/dist/agents/skills/shared/i18n-knowledge/locales/zh-Hans.yaml +38 -0
  165. package/dist/agents/skills/shared/i18n-knowledge/locales/zh-Hant.yaml +35 -0
  166. package/dist/agents/skills/shared/i18n-knowledge/mcp-interop/i18next-mcp.yaml +56 -0
  167. package/dist/agents/skills/shared/i18n-knowledge/mcp-interop/lingo-dev.yaml +56 -0
  168. package/dist/agents/skills/shared/i18n-knowledge/mcp-interop/lokalise.yaml +60 -0
  169. package/dist/agents/skills/shared/i18n-knowledge/mcp-interop/tolgee.yaml +60 -0
  170. package/dist/agents/skills/shared/i18n-knowledge/testing/locale-testing.yaml +107 -0
  171. package/dist/agents/skills/shared/i18n-knowledge/testing/pseudo-localization.yaml +86 -0
  172. package/dist/bin/harness.js +64 -4
  173. package/dist/{chunk-GA6GN5J2.js → chunk-E2RTDBMG.js} +2263 -41
  174. package/dist/{chunk-FFIX3QVG.js → chunk-KJANDVVC.js} +141 -49
  175. package/dist/{chunk-4WUGOJQ7.js → chunk-RT2LYQHF.js} +1 -1
  176. package/dist/{dist-C4J67MPP.js → dist-CCM3L3UE.js} +95 -1
  177. package/dist/{dist-N4D4QWFV.js → dist-K6KTTN3I.js} +4 -4
  178. package/dist/index.d.ts +187 -7
  179. package/dist/index.js +7 -3
  180. package/dist/validate-cross-check-ZGKFQY57.js +7 -0
  181. package/package.json +9 -9
  182. package/dist/agents/skills/node_modules/.bin/glob +0 -17
  183. package/dist/agents/skills/node_modules/.bin/vitest +0 -17
  184. package/dist/agents/skills/node_modules/.bin/yaml +0 -17
  185. package/dist/templates/advanced/docs/specs/.gitkeep +0 -0
  186. package/dist/templates/intermediate/docs/specs/.gitkeep +0 -0
  187. package/dist/validate-cross-check-WGXQ7K62.js +0 -7
@@ -1,123 +1,164 @@
1
1
  # Harness Code Review
2
2
 
3
- > Full code review lifecyclerequest, perform, respond with automated harness checks and technical rigor over social performance.
3
+ > Multi-phase code review pipelinemechanical checks, graph-scoped context, parallel review agents, cross-agent deduplication, and structured output with technical rigor over social performance.
4
4
 
5
5
  ## When to Use
6
6
 
7
- - When requesting a review of your completed work (before merge)
8
- - When performing a review of someone else's code (human or agent)
9
- - When responding to review feedback on your own code
10
- - When `on_review` or `on_pr` triggers fire
7
+ - When performing a code review (manual invocation or triggered by `on_pr` / `on_review`)
8
+ - When requesting a review of completed work (see Role A at the end of this document)
9
+ - When responding to review feedback (see Role C at the end of this document)
11
10
  - NOT for in-progress work (complete the feature first)
12
11
  - NOT for rubber-stamping (if you cannot find issues, look harder or state confidence level)
13
- - NOT for style-only feedback (leave that to linters)
12
+ - NOT for style-only feedback (leave that to linters and mechanical checks)
14
13
 
15
- ## Context Assembly
14
+ ## Process
16
15
 
17
- Before beginning any review phase, assemble context proportional to the change size.
16
+ The review runs as a 7-phase pipeline. Each phase has a clear input, output, and exit condition.
18
17
 
19
- ### 1:1 Context Ratio Rule
18
+ ```
19
+ Phase 1: GATE ──→ Phase 2: MECHANICAL ──→ Phase 3: CONTEXT ──→ Phase 4: FAN-OUT
20
+
21
+ Phase 7: OUTPUT ←── Phase 6: DEDUP+MERGE ←── Phase 5: VALIDATE ←──────┘
22
+ ```
20
23
 
21
- For every N lines of diff, gather approximately N lines of surrounding context. This ensures the reviewer understands the ecosystem around the change, not just the change itself.
24
+ | Phase | Tier | Purpose | Exit Condition |
25
+ | -------------- | ----- | -------------------------------------------------- | ----------------------------------------------------- |
26
+ | 1. GATE | fast | Skip ineligible PRs (CI mode only) | PR is eligible, or exit with reason |
27
+ | 2. MECHANICAL | none | Lint, typecheck, test, security scan | All pass → continue; any fail → report and stop |
28
+ | 3. CONTEXT | fast | Scope context per review domain | Context bundles assembled for each subagent |
29
+ | 4. FAN-OUT | mixed | Parallel review subagents | All subagents return findings in ReviewFinding schema |
30
+ | 5. VALIDATE | none | Exclude mechanical duplicates, verify reachability | Unvalidated findings discarded |
31
+ | 6. DEDUP+MERGE | none | Group, merge, assign final severity | Deduplicated finding list with merged evidence |
32
+ | 7. OUTPUT | none | Text output or inline GitHub comments | Review delivered, exit code set |
33
+
34
+ ### Finding Schema
35
+
36
+ Each review agent produces findings in this common format:
37
+
38
+ ```typescript
39
+ interface ReviewFinding {
40
+ id: string; // unique, for dedup
41
+ file: string; // file path
42
+ lineRange: [number, number]; // start, end
43
+ domain: 'compliance' | 'bug' | 'security' | 'architecture';
44
+ severity: 'critical' | 'important' | 'suggestion';
45
+ title: string; // one-line summary
46
+ rationale: string; // why this is an issue
47
+ suggestion?: string; // fix, if available
48
+ evidence: string[]; // supporting context from agent
49
+ validatedBy: 'mechanical' | 'graph' | 'heuristic';
50
+ }
51
+ ```
22
52
 
23
- - **Small diffs (<20 lines):** Gather proportionally more context — aim for 3:1 context-to-diff. Small changes often have outsized impact and need more surrounding understanding.
24
- - **Medium diffs (20-200 lines):** Target 1:1 ratio. Read the full files containing changes, plus immediate dependencies.
25
- - **Large diffs (>200 lines):** 1:1 ratio is the floor, but prioritize ruthlessly using the priority order below. Flag large diffs as a review concern — they are harder to review correctly.
53
+ ### Flags
26
54
 
27
- ### Context Gathering Priority Order
55
+ | Flag | Effect |
56
+ | ----------------- | ------------------------------------------------------------------------------------------- |
57
+ | `--comment` | Post inline comments to GitHub PR via `gh` CLI or GitHub MCP |
58
+ | `--deep` | Pass `--deep` to `harness-security-review` for threat modeling in the security fan-out slot |
59
+ | `--no-mechanical` | Skip mechanical checks (useful if already run in CI) |
60
+ | `--ci` | Enable eligibility gate, non-interactive output |
28
61
 
29
- Gather context in this order until the ratio is met:
62
+ ### Model Tiers
30
63
 
31
- 1. **Files directly imported/referenced by changed files** read the modules that the changed code calls or depends on. Without this, you cannot evaluate correctness.
32
- 2. **Corresponding test files** — find tests for the changed code. If tests exist, read them to understand expected behavior. If tests are missing, note this as a finding.
33
- 3. **Spec/design docs mentioning changed components** — search `docs/specs/`, `docs/design-docs/`, and `docs/plans/` for references to the changed files or features. The spec defines "correct."
34
- 4. **Type definitions used by changed code** — read interfaces, types, and schemas that the changed code consumes or produces. Type mismatches are high-severity bugs.
35
- 5. **Recent commits touching the same files** — see Commit History below.
64
+ Tiers are abstract labels resolved at runtime from project config. If no config exists, all phases use the current model (no tiering).
36
65
 
37
- ### Context Assembly Commands
66
+ | Tier | Default | Used By |
67
+ | ---------- | ------------ | ------------------------------------ |
68
+ | `fast` | haiku-class | GATE, CONTEXT |
69
+ | `standard` | sonnet-class | Compliance agent, Architecture agent |
70
+ | `strong` | opus-class | Bug Detection agent, Security agent |
38
71
 
39
- ```bash
40
- # 1. Get the diff and measure its size
41
- git diff --stat HEAD~1 # or the relevant commit range
42
- git diff HEAD~1 -- <file> # per-file diff
72
+ ### Review Learnings Calibration
43
73
 
44
- # 2. Find imports/references in changed files
45
- grep -n "import\|require\|from " <changed-file>
74
+ Before starting the pipeline, check for a project-specific calibration file:
46
75
 
47
- # 3. Find corresponding test files
48
- find . -name "*<module-name>*test*" -o -name "*<module-name>*spec*"
76
+ ```bash
77
+ cat .harness/review-learnings.md 2>/dev/null
78
+ ```
49
79
 
50
- # 4. Search for spec/design references
51
- grep -rl "<component-name>" docs/specs/ docs/design-docs/ docs/plans/
80
+ If `.harness/review-learnings.md` exists:
52
81
 
53
- # 5. Find type definitions
54
- grep -rn "interface\|type\|schema" <changed-file> | head -20
55
- ```
82
+ 1. **Read the Useful Findings section.** Prioritize these categories during review — they have historically caught real issues in this project.
83
+ 2. **Read the Noise / False Positives section.** De-prioritize or skip these categories — flagging them wastes the author's time and erodes trust in the review process.
84
+ 3. **Read the Calibration Notes section.** Apply these project-specific overrides to your review judgment. These represent deliberate team decisions, not oversights.
56
85
 
57
- ### Graph-Enhanced Context (when available)
86
+ If the file does not exist, proceed with default review focus areas. After completing the review, consider suggesting that the team create `.harness/review-learnings.md` if you notice patterns that would benefit from calibration.
58
87
 
59
- When a knowledge graph exists at `.harness/graph/`, use graph queries for faster, more accurate context gathering:
88
+ ## Pipeline Phases
60
89
 
61
- - `query_graph` traverse dependency chain from changed files to find all imports and transitive dependencies (replaces grep for import tracing)
62
- - `get_impact` — find all affected tests, docs, and downstream code that may break from the change
63
- - `find_context_for` — assemble review context for changed files within token budget, ranked by relevance
90
+ ### Phase 1: GATE
64
91
 
65
- Graph queries replace manual grep/find commands and discover transitive dependencies that file search misses. Fall back to file-based commands if no graph is available.
92
+ **Tier:** fast
93
+ **Mode:** CI only (`--ci` flag). When invoked manually, skip this phase entirely.
66
94
 
67
- ### Commit History Context
95
+ Check whether the PR should be reviewed at all. This prevents wasted compute in CI pipelines.
68
96
 
69
- As part of context assembly (priority item #5), retrieve recent commit history for every affected file:
97
+ **Checks:**
98
+
99
+ 1. **PR state:** Is the PR closed or merged? → Skip with reason "PR is closed."
100
+ 2. **Draft status:** Is the PR marked as draft? → Skip with reason "PR is draft."
101
+ 3. **Trivial change:** Is the diff documentation-only (all changed files are `.md`)? → Skip with reason "Documentation-only change."
102
+ 4. **Already reviewed:** Has this exact commit range been reviewed before (check for prior review comment from this tool)? → Skip with reason "Already reviewed at {sha}."
70
103
 
71
104
  ```bash
72
- # Recent commits touching affected files (5 per file)
73
- git log --oneline -5 -- <affected-file>
105
+ # Check PR state
106
+ gh pr view --json state,isDraft,files
74
107
 
75
- # For all affected files at once
76
- git log --oneline -5 -- <file1> <file2> <file3>
108
+ # Check if documentation-only
109
+ gh pr diff --name-only | grep -v '\.md$' | wc -l # 0 means docs-only
77
110
  ```
78
111
 
79
- Use commit history to answer:
112
+ **Exit:** If any check triggers a skip, output the reason and exit with code 0. Otherwise, continue to Phase 2.
80
113
 
81
- - **Is this a hotspot?** If the file has been changed 3+ times in the last 5 commits, it is volatile. Pay extra attention — frequent changes suggest instability or ongoing refactoring.
82
- - **Was this recently refactored?** If recent commits include "refactor" or "restructure," check whether the current change aligns with or contradicts the refactoring direction.
83
- - **Who has been working here?** If multiple authors touched the file recently, there may be conflicting assumptions. Look for consistency.
84
- - **What was the last change?** The most recent commit gives context on the file's trajectory. A bugfix followed by another change to the same area is a yellow flag.
114
+ ---
85
115
 
86
- ### Review Learnings Calibration
116
+ ### Phase 2: MECHANICAL
87
117
 
88
- Before starting the review, check for a project-specific calibration file:
118
+ **Tier:** none (no LLM)
119
+ **Mode:** Skipped if `--no-mechanical` flag is set.
89
120
 
90
- ```bash
91
- # Check if review learnings file exists
92
- cat .harness/review-learnings.md 2>/dev/null
93
- ```
121
+ Run mechanical checks to establish an exclusion boundary. Any issue caught mechanically is excluded from AI review (Phase 4) to prevent duplicate findings.
94
122
 
95
- If `.harness/review-learnings.md` exists:
123
+ **Checks:**
96
124
 
97
- 1. **Read the Useful Findings section.** Prioritize these categories during review — they have historically caught real issues in this project.
98
- 2. **Read the Noise / False Positives section.** De-prioritize or skip these categories — flagging them wastes the author's time and erodes trust in the review process.
99
- 3. **Read the Calibration Notes section.** Apply these project-specific overrides to your review judgment. These represent deliberate team decisions, not oversights.
125
+ 1. **Harness validation:**
126
+ ```bash
127
+ harness validate
128
+ harness check-deps
129
+ harness check-docs
130
+ ```
131
+ 2. **Security scan:** Run `run_security_scan` MCP tool on changed files. Record findings with rule ID, file, line, and remediation.
132
+ 3. **Type checking:** Run the project's type checker (e.g., `tsc --noEmit`). Record any type errors.
133
+ 4. **Linting:** Run the project's linter (e.g., `eslint`). Record any lint violations.
134
+ 5. **Tests:** Run the project's test suite. Record any failures.
100
135
 
101
- If the file does not exist, proceed with default review focus areas. After completing the review, consider suggesting that the team create `.harness/review-learnings.md` if you notice patterns that would benefit from calibration.
136
+ **Output:** A set of mechanical findings (file, line, tool, message). This set becomes the exclusion list for Phase 5.
137
+
138
+ **Exit:** If any mechanical check fails (harness validate, typecheck, or tests), report the mechanical failures in Strengths/Issues/Assessment format and stop the pipeline. The code has fundamental issues that must be fixed before AI review adds value. Lint warnings and security scan findings do not stop the pipeline — they are recorded for exclusion only.
139
+
140
+ ---
102
141
 
103
- ## Change-Type Detection
142
+ ### Phase 3: CONTEXT
104
143
 
105
- After assembling context, determine the change type. This shapes which checklist to apply during review.
144
+ **Tier:** fast
145
+ **Purpose:** Assemble scoped context bundles for each review domain. Each subagent in Phase 4 receives only the context relevant to its domain, not the full diff.
106
146
 
107
- ### Detection Method
147
+ #### Change-Type Detection
108
148
 
109
- 1. **Explicit argument:** If the review was invoked with a change type (e.g., `--type feature`), use it.
110
- 2. **Commit message prefix:** Parse the most recent commit message for conventional commit prefixes:
149
+ Before scoping context, determine the change type. This shapes which review focus areas apply.
150
+
151
+ 1. **Commit message prefix:** Parse the most recent commit message for conventional commit prefixes:
111
152
  - `feat:` or `feature:` → **feature**
112
153
  - `fix:` or `bugfix:` → **bugfix**
113
154
  - `refactor:` → **refactor**
114
155
  - `docs:` or `doc:` → **docs**
115
- 3. **Diff pattern heuristic:** If no prefix is found, examine the diff:
156
+ 2. **Diff pattern heuristic:** If no prefix is found, examine the diff:
116
157
  - New files added + tests added → likely **feature**
117
158
  - Small changes to existing files + test added → likely **bugfix**
118
159
  - File renames, moves, or restructuring with no behavior change → likely **refactor**
119
160
  - Only `.md` files or comments changed → likely **docs**
120
- 4. **Default:** If detection is ambiguous, treat as **feature** (the most thorough checklist).
161
+ 3. **Default:** If detection is ambiguous, treat as **feature** (the most thorough review).
121
162
 
122
163
  ```bash
123
164
  # Parse commit message prefix
@@ -127,266 +168,480 @@ git log --oneline -1 | head -1
127
168
  git diff --name-status HEAD~1 | grep "^A"
128
169
 
129
170
  # Check if only docs changed
130
- git diff --name-only HEAD~1 | grep -v "\.md$" | wc -l # 0 means docs-only
171
+ git diff --name-only HEAD~1 | grep -v '\.md$' | wc -l # 0 means docs-only
131
172
  ```
132
173
 
133
- ### Security Review (All Change Types)
174
+ #### Context Scoping
134
175
 
135
- Every code review includes a security check, regardless of change type. This runs in addition to the per-type checklist below.
176
+ Scope context per review domain. When a knowledge graph exists at `.harness/graph/`, use graph queries. Otherwise, fall back to file-based heuristics.
136
177
 
137
- 1. **Mechanical scan:** Run `run_security_scan` MCP tool on the changed files. Report any findings with rule ID, file, line, and remediation.
138
- 2. **Semantic security review:** Look for issues the mechanical scanner cannot catch:
139
- - User input flowing through multiple functions to a dangerous sink (SQL, shell, HTML)
140
- - Missing authorization checks on new or modified endpoints
141
- - Sensitive data exposed in logs, error messages, or API responses
142
- - Authentication bypass paths introduced by the change
143
- - Insecure defaults in new configuration options
144
- 3. **Stack-adaptive focus:** Based on the project's tech stack, apply relevant domain knowledge (e.g., prototype pollution for Node.js, XSS for React, race conditions for Go).
178
+ | Domain | With Graph | Without Graph (Fallback) |
179
+ | ----------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- |
180
+ | **Compliance** | Convention files (`CLAUDE.md`, `AGENTS.md`, `.harness/`) + changed files | Convention files + changed files (same no graph needed) |
181
+ | **Bug Detection** | Changed files + direct dependencies via `query_graph` | Changed files + files imported by changed files (`grep import`) |
182
+ | **Security** | Security-relevant paths + data flow traversal via `query_graph` | Changed files + files containing security-sensitive patterns (auth, crypto, SQL, shell) |
183
+ | **Architecture** | Layer boundaries + import graph via `query_graph` + `get_impact` | Changed files + `harness check-deps` output |
184
+
185
+ #### 1:1 Context Ratio Rule
186
+
187
+ For every N lines of diff, gather approximately N lines of surrounding context:
188
+
189
+ - **Small diffs (<20 lines):** Gather proportionally more context — aim for 3:1 context-to-diff.
190
+ - **Medium diffs (20-200 lines):** Target 1:1 ratio. Read full files containing changes, plus immediate dependencies.
191
+ - **Large diffs (>200 lines):** 1:1 ratio is the floor. Prioritize ruthlessly. Flag large diffs as a review concern.
192
+
193
+ #### Context Gathering Priority Order
194
+
195
+ Gather context in this order until the ratio is met:
196
+
197
+ 1. **Files directly imported/referenced by changed files** — read the modules the changed code calls or depends on.
198
+ 2. **Corresponding test files** — find tests for changed code. If tests are missing, note this as a finding.
199
+ 3. **Spec/design docs mentioning changed components** — search `docs/changes/`, `docs/design-docs/`, `docs/plans/`.
200
+ 4. **Type definitions used by changed code** — read interfaces, types, schemas consumed or produced.
201
+ 5. **Recent commits touching the same files** — see Commit History below.
202
+
203
+ #### Graph-Enhanced Context (when available)
204
+
205
+ When a knowledge graph exists at `.harness/graph/`, use graph queries for faster, more accurate context:
145
206
 
146
- Security findings are always "blocking" if they represent a confirmed vulnerability (not a potential pattern match). Include CWE references where applicable.
207
+ - `query_graph` traverse dependency chain from changed files to find all imports and transitive dependencies
208
+ - `get_impact` — find all affected tests, docs, and downstream code
209
+ - `find_context_for` — assemble review context within token budget, ranked by relevance
147
210
 
148
- ### Per-Type Review Checklists
211
+ Graph queries replace manual grep/find commands and discover transitive dependencies that file search misses. Fall back to file-based commands if no graph is available.
212
+
213
+ #### Context Assembly Commands
214
+
215
+ ```bash
216
+ # 1. Get the diff and measure its size
217
+ git diff --stat HEAD~1 # or the relevant commit range
218
+ git diff HEAD~1 -- <file> # per-file diff
219
+
220
+ # 2. Find imports/references in changed files
221
+ grep -n "import\|require\|from " <changed-file>
149
222
 
150
- Apply the checklist matching the detected change type. These replace the generic review — do not apply all checklists to every change.
223
+ # 3. Find corresponding test files
224
+ find . -name "*<module-name>*test*" -o -name "*<module-name>*spec*"
225
+
226
+ # 4. Search for spec/design references
227
+ grep -rl "<component-name>" docs/changes/ docs/design-docs/ docs/plans/
228
+
229
+ # 5. Find type definitions
230
+ grep -rn "interface\|type\|schema" <changed-file> | head -20
231
+ ```
151
232
 
152
- #### Feature Checklist
233
+ #### Commit History Context
234
+
235
+ Retrieve recent commit history for every affected file:
236
+
237
+ ```bash
238
+ # Recent commits touching affected files (5 per file)
239
+ git log --oneline -5 -- <affected-file>
240
+ ```
241
+
242
+ Use commit history to answer:
243
+
244
+ - **Is this a hotspot?** Changed 3+ times in last 5 commits → volatile, pay extra attention.
245
+ - **Was this recently refactored?** Recent "refactor" commits → check alignment with refactoring direction.
246
+ - **Who has been working here?** Multiple authors → look for conflicting assumptions.
247
+ - **What was the last change?** Bugfix followed by change in same area → yellow flag.
248
+
249
+ **Exit:** Context bundles are assembled for each of the four review domains. Continue to Phase 4.
250
+
251
+ ---
252
+
253
+ ### Phase 4: FAN-OUT
254
+
255
+ **Tier:** mixed (see per-agent tiers below)
256
+ **Purpose:** Run four parallel review subagents, each with domain-scoped context from Phase 3. Each agent produces findings in the `ReviewFinding` schema.
257
+
258
+ #### Compliance Agent (standard tier)
259
+
260
+ Reviews adherence to project conventions, standards, and documentation requirements.
261
+
262
+ **Input:** Compliance context bundle (convention files + changed files + change type)
263
+
264
+ **Focus by change type:**
265
+
266
+ _Feature:_
153
267
 
154
268
  - [ ] **Spec alignment:** Does the implementation match the spec/design doc? Are all specified behaviors present?
155
- - [ ] **Edge cases:** Are boundary conditions handled (empty input, max values, null, concurrent access)?
156
- - [ ] **Test coverage:** Are there tests for happy path, error paths, and edge cases? Is coverage meaningful, not just present?
157
269
  - [ ] **API surface:** Are new public interfaces minimal and well-named? Could any new export be kept internal?
158
270
  - [ ] **Backward compatibility:** Does this break existing callers? If so, is the migration path documented?
159
271
 
160
- #### Bugfix Checklist
272
+ _Bugfix:_
161
273
 
162
- - [ ] **Root cause identified:** Does the fix address the root cause, not just the symptom? Is the original issue referenced?
163
- - [ ] **Regression test added:** Is there a test that would have caught this bug before the fix? Does it fail without the fix and pass with it?
164
- - [ ] **No collateral changes:** Does the fix change only what is necessary? Unrelated changes in a bugfix PR are a red flag.
274
+ - [ ] **Root cause identified:** Does the fix address the root cause, not just the symptom?
165
275
  - [ ] **Original issue referenced:** Does the commit or PR reference the bug report or issue number?
276
+ - [ ] **No collateral changes:** Does the fix change only what is necessary?
166
277
 
167
- #### Refactor Checklist
278
+ _Refactor:_
168
279
 
169
- - [ ] **Behavioral equivalence:** Do all existing tests still pass without modification? If tests changed, justify why.
170
- - [ ] **No functionality changes:** Does the refactor introduce any new behavior, even subtly? New behavior belongs in a feature PR.
171
- - [ ] **Performance preserved:** Could the restructuring introduce performance regressions (e.g., extra allocations, changed query patterns)?
172
- - [ ] **Improved clarity:** Is the code demonstrably clearer after the refactor? If not, the refactor may not be justified.
280
+ - [ ] **Behavioral equivalence:** Do all existing tests still pass without modification?
281
+ - [ ] **No functionality changes:** Does the refactor introduce any new behavior?
173
282
 
174
- #### Docs Checklist
283
+ _Docs:_
175
284
 
176
- - [ ] **Accuracy vs. current code:** Do the documented behaviors match what the code actually does? Run the examples if possible.
177
- - [ ] **Completeness:** Are all public interfaces documented? Are there undocumented parameters, return values, or error conditions?
178
- - [ ] **Consistency:** Does the new documentation follow the same style, terminology, and structure as existing docs?
179
- - [ ] **Links valid:** Do all internal links resolve? Are external links still live?
285
+ - [ ] **Accuracy vs. current code:** Do documented behaviors match what the code actually does?
286
+ - [ ] **Completeness:** Are all public interfaces documented?
287
+ - [ ] **Consistency:** Does new documentation follow existing style and terminology?
288
+ - [ ] **Links valid:** Do all internal links resolve?
180
289
 
181
- ## Process
290
+ **Output:** `ReviewFinding[]` with `domain: 'compliance'`
291
+
292
+ ---
182
293
 
183
- This skill covers three distinct roles. Follow the section that matches your current role.
294
+ #### Bug Detection Agent (strong tier)
295
+
296
+ Reviews for logic errors, edge cases, and correctness issues.
297
+
298
+ **Input:** Bug detection context bundle (changed files + dependencies)
299
+
300
+ **Focus areas:**
301
+
302
+ - [ ] **Edge cases:** Boundary conditions (empty input, max values, null, concurrent access)
303
+ - [ ] **Error handling:** Errors handled at appropriate level, helpful messages, no silent swallowing
304
+ - [ ] **Logic errors:** Off-by-one, incorrect boolean logic, missing early returns
305
+ - [ ] **Race conditions:** Concurrent access to shared state, missing locks or atomic operations
306
+ - [ ] **Resource leaks:** Unclosed handles, missing cleanup in error paths
307
+ - [ ] **Type safety:** Type mismatches, unsafe casts, missing null checks
308
+ - [ ] **Test coverage:** Tests for happy path, error paths, and edge cases. Coverage meaningful, not just present.
309
+ - [ ] **Regression tests:** For bugfixes — test that would have caught the bug before the fix
310
+
311
+ **Output:** `ReviewFinding[]` with `domain: 'bug'`
184
312
 
185
313
  ---
186
314
 
187
- ### Role A: Requesting a Review
315
+ #### Security Agent (strong tier) -- via harness-security-review
188
316
 
189
- When you have completed work and need it reviewed.
317
+ Invokes `harness-security-review` in changed-files mode as the security slot in the fan-out.
190
318
 
191
- #### 1. Prepare the Review Context
319
+ **Input:** Security context bundle (security-relevant paths + data flows)
192
320
 
193
- Before requesting review, assemble the following:
321
+ **Invocation:** The pipeline invokes `harness-security-review` with scope `changed-files`. The skill:
194
322
 
195
- - **Commit range:** The exact SHAs or branch diff that constitute the change. Use `git log --oneline base..HEAD` to confirm.
196
- - **Description:** A concise summary of WHAT changed and WHY. Not a commit-by-commit retelling — the reviewer can read the diff. Focus on intent, tradeoffs, and anything non-obvious.
197
- - **Plan reference:** If this work implements a plan or spec, link to it. The reviewer needs to know what "correct" looks like.
198
- - **Test evidence:** Confirm tests pass. Include the test command and output summary. If tests were skipped, explain why.
199
- - **Harness check results:** Run `harness validate` and `harness check-deps` before requesting review. Include results. Fix any failures before requesting.
323
+ - Skips its own Phase 1 (SCAN) -- reads mechanical findings from PipelineContext (Phase 2 already ran `run_security_scan`)
324
+ - Runs Phase 2 (REVIEW) -- OWASP baseline + stack-adaptive on changed files and their direct imports
325
+ - Skips Phase 3 (THREAT-MODEL) unless `--deep` was passed to code review
326
+ - Returns `ReviewFinding[]` with populated security fields (`cweId`, `owaspCategory`, `confidence`, `remediation`, `references`)
200
327
 
201
- #### 2. Dispatch the Review
328
+ If `--deep` flag is set on code review, additionally pass `--deep` to `harness-security-review` for threat modeling.
202
329
 
203
- - **Identify the right reviewer.** For architectural changes, request review from someone who understands the architecture. For domain logic, someone who understands the domain.
204
- - **Provide the context package** (SHAs, description, plan reference, test evidence, harness results). Do not make the reviewer hunt for context.
205
- - **State what kind of feedback you want.** "Full review" vs "architecture only" vs "test coverage check" — be specific.
330
+ **Focus areas:**
206
331
 
207
- #### 3. Wait
332
+ 1. **Semantic security review** (issues mechanical scanners cannot catch):
333
+ - User input flowing through multiple functions to dangerous sinks (SQL, shell, HTML)
334
+ - Missing authorization checks on new or modified endpoints
335
+ - Sensitive data exposed in logs, error messages, or API responses
336
+ - Authentication bypass paths introduced by the change
337
+ - Insecure defaults in new configuration options
208
338
 
209
- Do not continue modifying the code under review. If you find issues while waiting, note them but do not push fixes until the review is complete. Interleaving changes with review creates confusion.
339
+ 2. **Stack-adaptive focus:** Based on the project's tech stack:
340
+ - Node.js: prototype pollution, ReDoS, path traversal
341
+ - React: XSS, dangerouslySetInnerHTML, state injection
342
+ - Go: race conditions, integer overflow, unsafe pointer
343
+ - Python: pickle deserialization, SSTI, command injection
210
344
 
211
- ---
345
+ 3. **CWE/OWASP references:** All security findings include `cweId`, `owaspCategory`, and `remediation` fields.
212
346
 
213
- ### Role B: Performing a Review
347
+ Security findings with confirmed vulnerabilities are always `severity: 'critical'`.
214
348
 
215
- When you are reviewing someone else's code.
349
+ **Dedup with mechanical scan:** The pipeline's Phase 5 (VALIDATE) uses the exclusion set from Phase 2 mechanical findings to discard any security-review finding that overlaps with an already-reported mechanical finding. This prevents duplicate reporting of the same issue.
216
350
 
217
- #### 1. Understand Before Judging
351
+ **Output:** `ReviewFinding[]` with `domain: 'security'`
218
352
 
219
- - **Read the description and plan first.** Understand what the change is trying to accomplish before reading code.
220
- - **Read the full diff.** Do not skim. Read every changed file. If the diff is large (>500 lines), note this as a concern — large diffs are harder to review correctly.
221
- - **Check the commit history.** Are commits atomic and well-described? Or is it one giant squash with "updates"?
353
+ ---
222
354
 
223
- #### 2. Run Automated Checks
355
+ #### Architecture Agent (standard tier)
224
356
 
225
- Run these commands and include results in your review:
357
+ Reviews for architectural violations, dependency direction, and design pattern compliance.
226
358
 
227
- ```bash
228
- harness validate # Full project health check
229
- harness check-deps # Dependency boundary verification
230
- harness check-docs # Documentation drift detection
231
- ```
359
+ **Input:** Architecture context bundle (layer boundaries + import graph)
232
360
 
233
- If any check fails, this is a **Critical** issue. The code cannot merge with failing harness checks.
361
+ **Focus areas:**
234
362
 
235
- #### 3. Evaluate Code Quality
363
+ - [ ] **Layer compliance:** Does the code respect the project's architectural layers? Are imports flowing in the correct direction?
364
+ - [ ] **Dependency direction:** Do modules depend on abstractions, not concretions? (Dependency Inversion)
365
+ - [ ] **Single Responsibility:** Does each module have one reason to change?
366
+ - [ ] **Open/Closed:** Can behavior be extended without modifying existing code?
367
+ - [ ] **Pattern consistency:** Does the code follow established codebase patterns? If introducing a new pattern, is it justified?
368
+ - [ ] **Separation of concerns:** Business logic separated from infrastructure? Each function/module does one thing?
369
+ - [ ] **DRY violations:** Duplicated logic that should be extracted — but NOT intentional duplication of things that will diverge.
370
+ - [ ] **Performance preserved:** Could restructuring introduce regressions (extra allocations, changed query patterns)?
236
371
 
237
- Review each changed file against these criteria:
372
+ **Output:** `ReviewFinding[]` with `domain: 'architecture'`
238
373
 
239
- **Separation of Concerns:**
374
+ **Exit:** All four agents have returned their findings. Continue to Phase 5.
375
+
376
+ ---
240
377
 
241
- - Does each function/module do one thing?
242
- - Are responsibilities clearly divided between files?
243
- - Is business logic separated from infrastructure?
378
+ ### Phase 5: VALIDATE
244
379
 
245
- **Error Handling:**
380
+ **Tier:** none (mechanical)
381
+ **Purpose:** Remove false positives by cross-referencing AI findings against mechanical results and graph reachability.
246
382
 
247
- - Are errors handled at the appropriate level?
248
- - Are error messages helpful for debugging?
249
- - Are edge cases handled (null, empty, boundary values)?
250
- - Do errors propagate correctly (not swallowed silently)?
383
+ **Steps:**
251
384
 
252
- **DRY (Don't Repeat Yourself):**
385
+ 1. **Mechanical exclusion:** For each finding from Phase 4, check if the same file + line range was already flagged by a mechanical check in Phase 2. If so, discard the AI finding — the mechanical check is authoritative and the issue is already reported.
253
386
 
254
- - Is there duplicated logic that should be extracted?
255
- - Are there copy-pasted blocks with minor variations?
256
- - BUT: do not flag intentional duplication (sometimes two similar things should remain separate because they will diverge).
387
+ 2. **Graph reachability validation (if graph available):** For findings that claim an issue affects other parts of the system (e.g., "this change breaks callers"), verify via `query_graph` that the claimed dependency path exists. Discard findings with invalid reachability claims.
257
388
 
258
- **Naming and Clarity:**
389
+ 3. **Import-chain heuristic (fallback, no graph):** Follow imports 2 levels deep from the flagged file. If the finding claims impact on a file not reachable within 2 import hops, downgrade severity to `suggestion` rather than discarding.
259
390
 
260
- - Do names communicate intent?
261
- - Are abbreviations explained or avoided?
262
- - Can you understand the code without reading the implementation of every called function?
391
+ **Exit:** Validated finding set. Continue to Phase 6.
263
392
 
264
- #### 4. Evaluate Architecture
393
+ ---
265
394
 
266
- **SOLID Principles:**
395
+ ### Phase 6: DEDUP + MERGE
267
396
 
268
- - Single Responsibility: Does each module have one reason to change?
269
- - Open/Closed: Can behavior be extended without modifying existing code?
270
- - Dependency Inversion: Do modules depend on abstractions, not concretions?
397
+ **Tier:** none (mechanical)
398
+ **Purpose:** Eliminate redundant findings across agents and produce the final finding list.
271
399
 
272
- **Layer Compliance:**
400
+ **Steps:**
273
401
 
274
- - Does the code respect the project's architectural layers?
275
- - Are imports flowing in the correct direction?
276
- - Does `harness check-deps` confirm no boundary violations?
402
+ 1. **Group by location:** Group findings by `file` + overlapping `lineRange`. Two findings overlap if their line ranges intersect or are within 3 lines of each other.
277
403
 
278
- **Pattern Consistency:**
404
+ 2. **Merge overlapping findings:** When multiple agents flag the same location:
405
+ - Keep the highest `severity` from any agent
406
+ - Combine `evidence` arrays from all agents
407
+ - Preserve the `rationale` with the strongest justification
408
+ - Merge `domain` tags (a finding can be both `bug` and `security`)
409
+ - Generate a single merged `id`
279
410
 
280
- - Does the code follow established patterns in the codebase?
281
- - If introducing a new pattern, is it justified and documented?
411
+ 3. **Assign final severity:**
412
+ - **Critical** Must fix before merge. Bugs, security vulnerabilities, failing harness checks, architectural violations that break boundaries.
413
+ - **Important** — Should fix before merge. Missing error handling, missing tests for critical paths, unclear naming.
414
+ - **Suggestion** — Consider for improvement. Style preferences, minor optimizations, alternative approaches. Does not block merge.
282
415
 
283
- #### 5. Evaluate Testing
416
+ **Exit:** Deduplicated, severity-assigned finding list. Continue to Phase 7.
284
417
 
285
- **Real Tests:**
418
+ ---
286
419
 
287
- - Do tests exercise real behavior, not mock implementations?
288
- - Do tests make meaningful assertions (not just "does not throw")?
289
- - Are tests deterministic (no flaky timing, network, or randomness)?
420
+ ### Phase 7: OUTPUT
290
421
 
291
- **Edge Cases:**
422
+ **Tier:** none
423
+ **Purpose:** Deliver the review in the requested format.
292
424
 
293
- - Are boundary conditions tested (empty input, max values, null)?
294
- - Are error paths tested (invalid input, network failures, permission errors)?
425
+ #### Text Output (default)
295
426
 
296
- **Coverage:**
427
+ When rendering the review output, use conventional markdown patterns:
297
428
 
298
- - Is every new public function/method tested?
299
- - Are critical paths covered (not just happy paths)?
429
+ For strengths:
300
430
 
301
- #### 6. Write the Review
431
+ ```
432
+ **[STRENGTH]** Clean separation between route handler and service logic
433
+ ```
302
434
 
303
- Structure your review output as follows:
435
+ For issues by severity:
436
+
437
+ ```
438
+ **[CRITICAL]** api/routes/users.ts:12-15 — Direct import from db/queries.ts bypasses service layer
439
+ **[IMPORTANT]** services/user-service.ts:45 — createUser does not handle duplicate email
440
+ **[SUGGESTION]** Consider extracting validation into a shared utility
441
+ ```
442
+
443
+ Structure the review as:
304
444
 
305
445
  **Strengths:** What is done well. Be specific. "Clean separation between X and Y" is useful. "Looks good" is not.
306
446
 
307
- **Issues:** Categorize each issue:
447
+ **Issues:** List each finding from Phase 6, grouped by severity:
308
448
 
309
- - **Critical** Must fix before merge. Bugs, security issues, failing harness checks, broken tests, architectural violations.
310
- - **Important** Should fix before merge. Missing error handling, missing tests for critical paths, unclear naming that will cause confusion.
311
- - **Suggestion** Consider for improvement. Style preferences, minor optimizations, alternative approaches. These do not block merge.
449
+ - **Critical:** [findings with severity 'critical']
450
+ - **Important:** [findings with severity 'important']
451
+ - **Suggestion:** [findings with severity 'suggestion']
312
452
 
313
453
  For each issue, provide:
314
454
 
315
- 1. The specific location (file and line or function name)
316
- 2. What the problem is
317
- 3. Why it matters
318
- 4. A suggested fix (if you have one)
455
+ 1. The specific location (file and line range)
456
+ 2. What the problem is (title)
457
+ 3. Why it matters (rationale)
458
+ 4. A suggested fix (if available)
319
459
 
320
460
  **Assessment:** One of:
321
461
 
322
462
  - **Approve** — No critical or important issues. Ready to merge.
323
- - **Request Changes** — Critical or important issues must be addressed. Re-review needed after fixes.
324
- - **Comment** — Observations only, no blocking issues, but author should consider feedback.
463
+ - **Request Changes** — Critical or important issues must be addressed.
464
+ - **Comment** — Observations only, no blocking issues.
465
+
466
+ **Exit code:** 0 for Approve/Comment, 1 for Request Changes.
467
+
468
+ #### Inline GitHub Comments (`--comment` flag)
469
+
470
+ When `--comment` is set, post findings as inline PR comments via `gh` CLI or GitHub MCP:
471
+
472
+ - **Small fixes** (suggestion is < 10 lines): Post as committable suggestion block using GitHub's suggestion syntax.
473
+ - **Large fixes** (suggestion is >= 10 lines or no concrete suggestion): Post description + rationale as a regular comment.
474
+ - **Summary comment:** Post the Strengths/Issues/Assessment as a top-level PR review comment.
475
+
476
+ ```bash
477
+ # Post a review with inline comments
478
+ gh pr review --event APPROVE|REQUEST_CHANGES|COMMENT --body "<summary>"
479
+
480
+ # Post inline comment with suggestion
481
+ gh api repos/{owner}/{repo}/pulls/{pr}/comments \
482
+ --field body="<rationale>\n\`\`\`suggestion\n<fix>\n\`\`\`" \
483
+ --field path="<file>" --field line=<line>
484
+ ```
485
+
486
+ ### Review Acceptance
487
+
488
+ After delivering the review output, request acceptance:
489
+
490
+ ```json
491
+ emit_interaction({
492
+ path: "<project-root>",
493
+ type: "confirmation",
494
+ confirmation: {
495
+ text: "Review complete: <Assessment>. Accept review?",
496
+ context: "<N critical, N important, N suggestion findings>"
497
+ }
498
+ })
499
+ ```
500
+
501
+ #### Handoff and Transition
502
+
503
+ After delivering the review output, write the handoff and conditionally transition:
504
+
505
+ Write `.harness/handoff.json`:
506
+
507
+ ```json
508
+ {
509
+ "fromSkill": "harness-code-review",
510
+ "phase": "OUTPUT",
511
+ "summary": "<assessment summary>",
512
+ "assessment": "approve | request-changes | comment",
513
+ "findingCount": { "critical": 0, "important": 0, "suggestion": 0 },
514
+ "artifacts": ["<reviewed files>"]
515
+ }
516
+ ```
517
+
518
+ **If assessment is "approve":**
519
+
520
+ Call `emit_interaction`:
521
+
522
+ ```json
523
+ {
524
+ "type": "transition",
525
+ "transition": {
526
+ "completedPhase": "review",
527
+ "suggestedNext": "merge",
528
+ "reason": "Review approved with no blocking issues",
529
+ "artifacts": ["<reviewed files>"],
530
+ "requiresConfirmation": true,
531
+ "summary": "Review approved. <N> suggestions noted. Ready to create PR or merge."
532
+ }
533
+ }
534
+ ```
535
+
536
+ If the user confirms: proceed to create PR or merge.
537
+ If the user declines: stop. The handoff is written for future invocation.
538
+
539
+ **If assessment is "request-changes":**
540
+
541
+ Do NOT emit a transition. Surface the critical and important findings to the user for resolution. After fixes are applied, re-run the review pipeline.
542
+
543
+ **If assessment is "comment":**
544
+
545
+ Do NOT emit a transition. Observations have been delivered. No further action is implied.
325
546
 
326
547
  ---
327
548
 
328
- ### Role C: Responding to Review Feedback
549
+ ## Role A: Requesting a Review
329
550
 
330
- When you receive feedback on your code.
551
+ _This section is not part of the pipeline. It documents the process for requesting a review from others._
331
552
 
332
- #### 1. Read All Feedback First
553
+ When you have completed work and need it reviewed:
333
554
 
334
- Read every comment before responding to any. Understand the full picture. Some comments may contradict each other or be resolved by the same fix.
555
+ 1. **Prepare the review context:**
556
+ - Commit range (exact SHAs or branch diff)
557
+ - Description (WHAT changed and WHY — not a commit-by-commit retelling)
558
+ - Plan reference (link to spec/plan if applicable)
559
+ - Test evidence (`harness validate` and test suite results)
560
+ - Harness check results (`harness validate`, `harness check-deps`)
335
561
 
336
- #### 2. Verify Before Implementing
562
+ 2. **Dispatch the review:** Identify the right reviewer, provide the context package, state what kind of feedback you want.
337
563
 
338
- For each piece of feedback:
564
+ 3. **Wait.** Do not modify code under review. Note issues but do not push fixes until review is complete.
339
565
 
340
- - **Do you understand it?** If not, ask for clarification. Do not guess at what the reviewer means.
341
- - **Is it correct?** Verify the reviewer's claim. Read the code they reference. Run the scenario they describe. Reviewers make mistakes too.
342
- - **Is it actionable?** Vague feedback ("this could be better") requires clarification. Ask for specific suggestions.
566
+ ---
343
567
 
344
- #### 3. Technical Rigor Over Social Performance
568
+ ## Role C: Responding to Review Feedback
345
569
 
346
- - **Do NOT agree with feedback just to be agreeable.** If the feedback is wrong, say so with evidence. "I considered that approach, but it does not work because [specific reason]" is a valid response.
347
- - **Do NOT implement every suggestion.** Apply the YAGNI check to every suggestion: Does this change serve a current, concrete need? If it is speculative ("you might need this later"), push back.
348
- - **Do NOT make changes you do not understand.** If a reviewer suggests a change and you cannot explain why it is better, do not make it. Ask them to explain.
349
- - **DO acknowledge when feedback is correct.** "Good catch, fixing" is appropriate when the reviewer found a real issue.
350
- - **DO push back when feedback contradicts the plan or spec.** The plan was approved. If review feedback wants to change the plan, that is a scope discussion, not a code review issue.
570
+ _This section is not part of the pipeline. It documents the process for responding to review feedback._
351
571
 
352
- #### 4. Implement Fixes
572
+ 1. **Read all feedback first.** Understand the full picture before responding.
353
573
 
354
- For each accepted piece of feedback:
574
+ 2. **Verify before implementing.** For each piece of feedback:
575
+ - Do you understand it? If not, ask for clarification.
576
+ - Is it correct? Verify the claim — reviewers make mistakes too.
577
+ - Is it actionable? Vague feedback requires clarification.
355
578
 
356
- 1. Make the change
357
- 2. Run the full test suite
358
- 3. Run `harness validate` and `harness check-deps`
359
- 4. Commit with a message referencing the review feedback
579
+ 3. **Technical rigor over social performance:**
580
+ - Do NOT agree with feedback just to be agreeable. Push back with evidence if wrong.
581
+ - Do NOT implement every suggestion. Apply YAGNI.
582
+ - Do NOT make changes you do not understand. Ask for explanation.
583
+ - DO acknowledge when feedback is correct.
584
+ - DO push back when feedback contradicts the approved plan/spec.
360
585
 
361
- #### 5. Re-request Review
586
+ 4. **Implement fixes:** For each accepted piece of feedback: make the change, run tests, run `harness validate` and `harness check-deps`, commit with a message referencing the review feedback.
362
587
 
363
- After addressing all feedback, re-request review with:
588
+ 5. **Re-request review** with summary of changes, which feedback was addressed vs. pushed back on, and fresh harness check results.
364
589
 
365
- - Summary of what changed
366
- - Which feedback was addressed and which was pushed back on (with reasons)
367
- - Fresh harness check results
590
+ ---
368
591
 
369
592
  ## Harness Integration
370
593
 
371
- - **`harness validate`** — Run before requesting review and during review performance. Must pass for approval.
372
- - **`harness check-deps`** — Run to verify dependency boundaries. Failures are Critical issues.
373
- - **`harness check-docs`** — Run to detect documentation drift. If code changed but docs did not, flag as Important.
374
- - **`harness cleanup`** — Optional during review to check for entropy accumulation in the changed files.
594
+ - **`harness validate`** — Run in Phase 2 (MECHANICAL). Must pass for the pipeline to continue to AI review.
595
+ - **`harness check-deps`** — Run in Phase 2 (MECHANICAL). Failures are Critical issues that stop the pipeline.
596
+ - **`harness check-docs`** — Run in Phase 2 (MECHANICAL). Documentation drift findings are recorded for the exclusion set.
597
+ - **`harness cleanup`** — Optional check during Phase 2 for entropy accumulation in changed files.
598
+ - **Graph queries** — Used in Phase 3 (CONTEXT) for dependency-scoped context and in Phase 5 (VALIDATE) for reachability verification. Graceful fallback when no graph exists.
599
+ - **`emit_interaction`** -- Call after review approval to suggest transitioning to merge/PR creation. Only emitted on APPROVE assessment. Uses confirmed transition (waits for user approval).
375
600
 
376
601
  ## Success Criteria
377
602
 
378
- - Every review request includes: commit range, description, plan reference, test evidence, harness results
379
- - Every review evaluates: code quality, architecture, testing, harness checks
380
- - Every review uses the Strengths/Issues/Assessment format
381
- - Issues are categorized as Critical/Important/Suggestion
603
+ - The pipeline runs all 7 phases in order when invoked manually (skipping GATE)
604
+ - The pipeline runs all 7 phases including GATE when invoked with `--ci`
605
+ - Mechanical failures in Phase 2 stop the pipeline before AI review (Phase 4)
606
+ - Each Phase 4 subagent receives only its domain-scoped context, not the full diff
607
+ - All findings use the ReviewFinding schema
608
+ - Mechanical findings from Phase 2 are excluded from Phase 4 output in Phase 5
609
+ - Cross-agent duplicate findings are merged in Phase 6
610
+ - Text output uses Strengths/Issues/Assessment format with Critical/Important/Suggestion severity
611
+ - `--comment` posts inline GitHub comments with committable suggestion blocks for small fixes
612
+ - `--deep` adds threat modeling to the Security agent
382
613
  - No code merges with Critical issues unresolved
383
614
  - No code merges with failing harness checks
384
- - Response to feedback is verified before implementation
615
+ - Response to feedback (Role C) is verified before implementation
385
616
  - Pushback on incorrect feedback is evidence-based
386
617
 
387
618
  ## Examples
388
619
 
389
- ### Example: Reviewing a New API Endpoint
620
+ ### Example: Pipeline Review of a New API Endpoint
621
+
622
+ **Phase 1 (GATE):** Skipped — manual invocation.
623
+
624
+ **Phase 2 (MECHANICAL):** `harness validate` passes. `harness check-deps` passes. Security scan finds no issues. `tsc --noEmit` passes. Lint passes.
625
+
626
+ **Phase 3 (CONTEXT):** Change type detected as `feature` (commit prefix `feat:`). Context bundles assembled:
627
+
628
+ - Compliance: `CLAUDE.md` + changed files
629
+ - Bug detection: `api/routes/users.ts`, `services/user-service.ts`, `db/queries.ts`
630
+ - Security: `api/routes/users.ts` (endpoint), `services/user-service.ts` (data flow)
631
+ - Architecture: import graph showing `routes → services → db` layers
632
+
633
+ **Phase 4 (FAN-OUT):** Four agents run in parallel:
634
+
635
+ - Compliance agent: 0 findings (spec alignment confirmed)
636
+ - Bug detection agent: 1 finding (missing duplicate email handling in createUser)
637
+ - Security agent: 0 findings (no vulnerabilities detected)
638
+ - Architecture agent: 1 finding (routes/users.ts imports directly from db/queries.ts)
639
+
640
+ **Phase 5 (VALIDATE):** No mechanical exclusions apply. Architecture finding validated by `check-deps` output showing layer violation.
641
+
642
+ **Phase 6 (DEDUP+MERGE):** No overlaps — 2 distinct findings in different files.
643
+
644
+ **Phase 7 (OUTPUT):**
390
645
 
391
646
  **Strengths:**
392
647
 
@@ -398,22 +653,19 @@ After addressing all feedback, re-request review with:
398
653
 
399
654
  **Critical:**
400
655
 
401
- - `harness check-deps` fails: `api/routes/users.ts` imports directly from `db/queries.ts`, bypassing the service layer. Must route through `services/user-service.ts`.
656
+ - `api/routes/users.ts:12-15` Direct import from `db/queries.ts` bypasses service layer. Must route through `services/user-service.ts`. (domain: architecture, validatedBy: heuristic)
402
657
 
403
658
  **Important:**
404
659
 
405
- - `services/user-service.ts:45` — `createUser` does not handle duplicate email. The database will throw a constraint violation that surfaces as a 500 error. Should catch and return a 409.
406
- - Missing test for concurrent creation with same email.
407
-
408
- **Suggestion:**
660
+ - `services/user-service.ts:45` — `createUser` does not handle duplicate email. Database will throw constraint violation surfacing as 500. Should catch and return 409. (domain: bug, validatedBy: heuristic)
409
661
 
410
- - Consider extracting the pagination logic in `api/routes/users.ts:30-55` into a shared utility — the same pattern exists in `api/routes/orders.ts`.
662
+ **Suggestion:** (none)
411
663
 
412
664
  **Assessment:** Request Changes — one critical layer violation and one important missing error handler.
413
665
 
414
666
  ## Gates
415
667
 
416
- - **Never skip review.** All code that will be merged must be reviewed. No exceptions for "small changes" or "obvious fixes."
668
+ - **Never skip mechanical checks without `--no-mechanical`.** If mechanical checks have not run (in CI or locally), they must run in Phase 2 before AI review.
417
669
  - **Never merge with failing harness checks.** `harness validate` and `harness check-deps` must pass. This is a Critical issue, always.
418
670
  - **Never implement feedback without verification.** Before changing code based on review feedback, verify the feedback is correct. Run the scenario. Read the code. Do not blindly comply.
419
671
  - **Never agree performatively.** "Sure, I'll change that" without understanding why is forbidden. Every change must be understood.
@@ -421,8 +673,9 @@ After addressing all feedback, re-request review with:
421
673
 
422
674
  ## Escalation
423
675
 
424
- - **When reviewers disagree:** If two reviewers give contradictory feedback, escalate to the human or tech lead. Do not try to satisfy both.
425
- - **When review feedback changes the plan:** If feedback requires changes that alter the approved plan or spec, pause the review. The plan must be updated and re-approved first.
426
- - **When you cannot reproduce a reported issue:** Ask the reviewer for exact reproduction steps. If they cannot provide them, the issue may not be real.
427
- - **When review is taking more than 2 rounds:** If the same code is going through a third round of review, something is fundamentally misaligned. Stop and discuss the approach in a meeting or synchronous conversation.
428
- - **When harness checks fail and you believe the check is wrong:** Do not override or skip the check. File an issue against the harness configuration and work around the limitation until it is resolved.
676
+ - **When reviewers disagree:** If two reviewers give contradictory feedback, escalate to the human or tech lead.
677
+ - **When review feedback changes the plan:** If feedback requires altering the approved plan or spec, pause the review. The plan must be updated first.
678
+ - **When you cannot reproduce a reported issue:** Ask the reviewer for exact reproduction steps.
679
+ - **When review is taking more than 2 rounds:** Something is fundamentally misaligned. Stop and discuss the approach synchronously.
680
+ - **When harness checks fail and you believe the check is wrong:** Do not override or skip. File an issue against the harness configuration.
681
+ - **When the pipeline produces a false positive after validation:** Add the pattern to `.harness/review-learnings.md` in the Noise / False Positives section for future calibration.