refacil-sdd-ai 5.2.2 → 5.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. package/NOTICE.md +46 -0
  2. package/README.md +209 -42
  3. package/agents/auditor.md +46 -0
  4. package/agents/debugger.md +41 -1
  5. package/agents/implementer.md +76 -10
  6. package/agents/investigator.md +36 -0
  7. package/agents/proposer.md +46 -2
  8. package/agents/tester.md +45 -8
  9. package/agents/validator.md +67 -13
  10. package/bin/cli.js +428 -83
  11. package/bin/postinstall.js +20 -0
  12. package/lib/bus/broker.js +121 -3
  13. package/lib/bus/spawn.js +189 -121
  14. package/lib/check-review.js +102 -0
  15. package/lib/codegraph-telemetry.js +135 -0
  16. package/lib/codegraph.js +273 -0
  17. package/lib/commands/autopilot.js +120 -0
  18. package/lib/commands/bus.js +29 -36
  19. package/lib/commands/compact.js +185 -46
  20. package/lib/commands/read-spec.js +352 -0
  21. package/lib/commands/sdd.js +429 -44
  22. package/lib/compact-guidance.js +122 -77
  23. package/lib/config.js +136 -0
  24. package/lib/global-paths.js +56 -20
  25. package/lib/hooks.js +32 -4
  26. package/lib/ide-detection.js +1 -1
  27. package/lib/ignore-files.js +5 -1
  28. package/lib/installer.js +202 -19
  29. package/lib/kapso.js +241 -0
  30. package/lib/methodology-migration-pending.js +13 -0
  31. package/lib/open-browser.js +32 -0
  32. package/lib/opencode-migrate.js +148 -0
  33. package/lib/opencode-plugin/index.js +84 -104
  34. package/lib/opencode-plugin/rules.js +236 -0
  35. package/lib/project-root.js +154 -0
  36. package/lib/repo-ide-sync.js +5 -0
  37. package/lib/spec-reader/lang.js +72 -0
  38. package/lib/spec-reader/md-parser.js +299 -0
  39. package/lib/spec-reader/session.js +139 -0
  40. package/lib/spec-reader/ui/app.js +685 -0
  41. package/lib/spec-reader/ui/index.html +59 -0
  42. package/lib/spec-reader/ui/mixed-lang.js +200 -0
  43. package/lib/spec-reader/ui/model-cache.js +117 -0
  44. package/lib/spec-reader/ui/style.css +294 -0
  45. package/lib/spec-reader/ui/supertonic-helper.js +565 -0
  46. package/lib/spec-sync.js +258 -0
  47. package/lib/test-scope.js +713 -0
  48. package/lib/testing-policy-sync.js +14 -2
  49. package/package.json +6 -3
  50. package/skills/apply/SKILL.md +39 -64
  51. package/skills/archive/SKILL.md +74 -48
  52. package/skills/ask/SKILL.md +43 -8
  53. package/skills/autopilot/SKILL.md +476 -0
  54. package/skills/bug/SKILL.md +52 -53
  55. package/skills/explore/SKILL.md +48 -1
  56. package/skills/guide/SKILL.md +31 -13
  57. package/skills/inbox/SKILL.md +9 -0
  58. package/skills/join/SKILL.md +1 -1
  59. package/skills/prereqs/BUS-CROSS-REPO.md +33 -16
  60. package/skills/prereqs/METHODOLOGY-CONTRACT.md +96 -17
  61. package/skills/prereqs/SKILL.md +1 -1
  62. package/skills/propose/SKILL.md +74 -19
  63. package/skills/read-spec/SKILL.md +76 -0
  64. package/skills/reply/SKILL.md +42 -9
  65. package/skills/review/SKILL.md +63 -25
  66. package/skills/review/checklist.md +2 -2
  67. package/skills/say/SKILL.md +40 -4
  68. package/skills/setup/SKILL.md +59 -5
  69. package/skills/setup/troubleshooting.md +11 -3
  70. package/skills/stats/SKILL.md +157 -0
  71. package/skills/test/SKILL.md +35 -10
  72. package/skills/up-code/SKILL.md +20 -13
  73. package/skills/update/SKILL.md +32 -1
  74. package/skills/verify/SKILL.md +78 -41
  75. package/templates/compact-guidance.md +10 -0
  76. package/templates/methodology-guide.md +5 -0
@@ -73,18 +73,21 @@ Read from the prompt the `BRIEFING:` sections passed by the wrapper:
73
73
  - `scope.doNotTouch` — files out of scope
74
74
  - `tasks` — numbered task list
75
75
  - `testScope` — `scoped` \| `full` (default **`scoped`** if absent — treat missing as scoped)
76
- - `testCommand` — **exact shell command** to execute for verification (narrowed when `scoped`)
76
+ - `testBaselineCommand` — project baseline test command; the implementer derives the smoke dynamically (no precomputed smoke in the briefing)
77
+ - `codegraphAvailable` — `true` \| `false` (passed by the wrapper; controls CodeGraph tool availability)
77
78
  - `verificationWarning` — optional hint from wrapper (often explains fallback-to-baseline)
78
79
  - `architectureContext` — already-extracted architecture context
79
80
  - `specsNote` — if there are specs, where they are and whether there are possible contradictions
80
81
 
81
82
  If the briefing is **not present** (direct invocation without briefing):
82
- 1. Read `refacil-sdd/changes/<changeName>/proposal.md` (objective)
83
- 2. Read `refacil-sdd/changes/<changeName>/design.md` (file scope)
84
- 3. Read `refacil-sdd/changes/<changeName>/tasks.md` (tasks)
83
+ 0. Run `git rev-parse --show-toplevel` → store as `<projectRoot>`. Use this absolute path for all artifact reads below — never relative paths in a monorepo.
84
+ 1. Read `<projectRoot>/refacil-sdd/changes/<changeName>/proposal.md` (objective)
85
+ 2. Read `<projectRoot>/refacil-sdd/changes/<changeName>/design.md` (file scope)
86
+ 3. Read `<projectRoot>/refacil-sdd/changes/<changeName>/tasks.md` (tasks)
85
87
  4. Read `AGENTS.md` (architecture)
86
88
  5. Read the change specs
87
- 6. Read `METHODOLOGY-CONTRACT.md` §3 and §3.1 (narrow **before** invoking the runner unless you explicitly widen)
89
+ 6. Read `METHODOLOGY-CONTRACT.md` §3 and §3.1 (narrow **before** invoking the runner unless you explicitly widen).
90
+ **`testBaselineCommand`** is the project baseline from `METHODOLOGY-CONTRACT.md §3` — use it verbatim; do not pre-narrow it here. When the wrapper supplies the briefing, `testBaselineCommand` is already extracted and passed directly.
88
91
 
89
92
  ### Step 2: Read existing interfaces (scope.modify only)
90
93
 
@@ -103,14 +106,41 @@ With the context loaded, implement each task in order:
103
106
 
104
107
  If a task requires touching a file outside the scope: note it in `issues` as potential scope creep and decide with a conservative criterion.
105
108
 
106
- ### Step 4: Verify
109
+ ### Step 4: Verify (dynamic smoke)
110
+
111
+ This verification is **smoke-only** and does NOT replace `/refacil:test` (canonical suite + coverage + `memory.commandsRun`).
107
112
 
108
113
  Follow **`METHODOLOGY-CONTRACT.md §3.1`**:
109
114
 
110
- 1. Run **exactly** the **`testCommand`** supplied in the briefing.
111
- 2. If **`testCommand` is missing**, resolve baseline from **`METHODOLOGY-CONTRACT.md §3`** and **narrow** it yourself using `scope.create` ∪ `scope.modify` plus the §3.1 **Scoped command patterns**. If narrowing is unsafe, run the baseline **once**, add **`issues`** entry severity **MEDIUM** explaining full-suite fallback, and cite `verificationWarning` pattern if analogous.
112
- 3. **Do not** broaden the briefing’s `testCommand` into a fuller suite when `testScope` is **`scoped`** (or omitted). Repo-wide regression belongs in CI or an explicit **`/refacil:test … full`**.
113
- 4. If `verificationWarning` is present in the briefing, mirror a short note in **`issues`** (severity **LOW**) so the wrapper/user sees CPU/RAM risk was intentional.
115
+ 1. **Determine files this run actually touched** by running:
116
+ ```
117
+ git diff --name-only HEAD
118
+ ```
119
+ If that returns nothing (e.g. working-tree changes only), fall back to:
120
+ ```
121
+ git status --porcelain
122
+ ```
123
+ and extract the filenames from the output.
124
+
125
+ 2. **Derive a minimal scoped smoke command** (stack-agnostic — no hardcoded runners):
126
+ ```
127
+ refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>"
128
+ ```
129
+ Use the resulting `testCommand` from the output.
130
+
131
+ 3. **Run the resulting smoke command.**
132
+
133
+ 4. **Fallback rules** — `/refacil:apply` **NEVER runs the full baseline as verification**. The §3.1 "unreliable scope → run baseline once" escape hatch does **NOT** apply here; that rule is for `/refacil:test` only.
134
+ - If `test-scope` returns a scoped command → run it (unchanged).
135
+ - If `test-scope` returns `fallback: true`, or fails, or the git diff/status output was empty (no touched files): identify any touched files that are themselves test files (matching the project test naming: `*.test.js`, `*.spec.js`, `*.test.ts`, `*.spec.ts`, `test_*.py`, `*_test.go`, etc.). Run **only those files** directly.
136
+ - If there are no such self-test files either → **SKIP** verification entirely. Add an **`issues`** entry severity **LOW** with description "no scopeable tests for touched files — verification deferred to /refacil:test" and set Verification to SKIPPED (deferred). Do **NOT** run `testBaselineCommand` in this case.
137
+ - In all fallback cases, add an **`issues`** entry severity **LOW** with `fallbackReason` from `test-scope` (or "empty diff / no touched files").
138
+
139
+ 5. **Note**: the `testBaselineCommand` field in the briefing is the project baseline command resolved at the **affected component root** (language-agnostic, per §3 component principle — the wrapper already resolved it there). The `sdd test-scope` call in step 2 produces a command with the correct `cd <component>` prefix when the component is a subdirectory. The smoke computed here replaces any precomputed `smokeTestCommand` — the briefing must NOT pre-supply a smoke command.
140
+
141
+ 6. If `verificationWarning` is present in the briefing, mirror a short note in **`issues`** (severity **LOW**) so the wrapper/user sees it.
142
+
143
+ 7. **Do not** broaden beyond the smoke into a fuller suite when `testScope` is **`scoped`** (or omitted). Repo-wide regression belongs in CI or an explicit **`/refacil:test … full`**. This verification is **smoke-only** and does NOT replace `/refacil:test` (canonical suite + coverage + `memory.commandsRun`).
114
144
 
115
145
  ### Step 5: Report + JSON block
116
146
 
@@ -150,6 +180,42 @@ Your final response MUST have this structure:
150
180
  - `filesRead` lists the files you read (for cost observability).
151
181
  - `issues` must be an empty array `[]` if there are no problems.
152
182
 
183
+ ## CodeGraph integration (optional)
184
+
185
+ If `codegraphAvailable: true` was passed by the wrapper, CodeGraph MCP tools are available:
186
+ - `codegraph_search <symbol>` — find definitions and usages of a symbol
187
+ - `codegraph_callers <symbol>` — list all callers of a function or method
188
+ - `codegraph_callees <symbol>` — list all functions called by a given function
189
+ - `codegraph_context <file>` — get focused structural context for a task or area
190
+ - `codegraph_impact <symbol>` — estimate the blast radius of a change
191
+ - `codegraph_node <symbol>` — show a symbol's source, signature, or docstring
192
+ - `codegraph_explore <query>` — deep survey of an unfamiliar module or topic (token-heavy; use once per investigation, not repeatedly)
193
+ - `codegraph_files <path>` — list files indexed under a directory path
194
+
195
+ **When to use CodeGraph — scope is unknown (fan-out is high):**
196
+ - "Who calls X?" across a large or unfamiliar codebase
197
+ - Blast radius / impact of changing a symbol
198
+ - Disambiguating a symbol that appears in many files
199
+ - Tracing a cross-module or cross-package flow you don't know yet
200
+
201
+ **When to use Grep/Read directly — scope is already bounded:**
202
+ - You already know the file(s) to look at (≤ 3–4 files)
203
+ - Simple endpoint flow: one controller → one service method (1–2 Greps find everything)
204
+ - Literal text search: log messages, config keys, string constants
205
+ - Logic is inline in a single method — callees won't add information
206
+ - Question asks about file content, not symbol relationships
207
+
208
+ **Decision rule:** ask yourself — "Do I already know where to look?" If yes, start with Grep. If no (unknown codebase, cross-module, many candidates), start with CodeGraph.
209
+
210
+ **Fallback:** if CodeGraph returns empty results for something that should have callers, fall back to Grep. Common reasons:
211
+ - Framework-managed entry points (HTTP routes, queue consumers, scheduled jobs) — called by the runtime, not by code
212
+ - DI / IoC containers: NestJS (`@Injectable`), Spring (`@Autowired`), Angular (`@Component`), Laravel, etc.
213
+ - Dynamic dispatch: interfaces, abstract class overrides, plugin registries
214
+
215
+ When falling back, use Grep with the symbol name and log: `[CodeGraph fallback: <reason>]`.
216
+
217
+ **Do not use CodeGraph** when `codegraphAvailable: false` was passed by the wrapper.
218
+
153
219
  ## Rules
154
220
 
155
221
  - NEVER generate SDD artifacts from this agent.
@@ -71,6 +71,42 @@ At the end of the report, suggest:
71
71
  - If the user might want to make a change: "Run `/refacil:propose <description>` to create a proposal"
72
72
  - If the user might want to investigate further: "Run `/refacil:explore <other question>` to continue exploring"
73
73
 
74
+ ## CodeGraph integration (optional)
75
+
76
+ If `codegraphAvailable: true` was passed by the wrapper, CodeGraph MCP tools are available:
77
+ - `codegraph_search <symbol>` — find definitions and usages of a symbol
78
+ - `codegraph_callers <symbol>` — list all callers of a function or method
79
+ - `codegraph_callees <symbol>` — list all functions called by a given function
80
+ - `codegraph_context <file>` — get focused structural context for a task or area
81
+ - `codegraph_impact <symbol>` — estimate the blast radius of a change
82
+ - `codegraph_node <symbol>` — show a symbol's source, signature, or docstring
83
+ - `codegraph_explore <query>` — deep survey of an unfamiliar module or topic (token-heavy; use once per investigation, not repeatedly)
84
+ - `codegraph_files <path>` — list files indexed under a directory path
85
+
86
+ **When to use CodeGraph — scope is unknown (fan-out is high):**
87
+ - "Who calls X?" across a large or unfamiliar codebase
88
+ - Blast radius / impact of changing a symbol
89
+ - Disambiguating a symbol that appears in many files
90
+ - Tracing a cross-module or cross-package flow you don't know yet
91
+
92
+ **When to use Grep/Read directly — scope is already bounded:**
93
+ - You already know the file(s) to look at (≤ 3–4 files)
94
+ - Simple endpoint flow: one controller → one service method (1–2 Greps find everything)
95
+ - Literal text search: log messages, config keys, string constants
96
+ - Logic is inline in a single method — callees won't add information
97
+ - Question asks about file content, not symbol relationships
98
+
99
+ **Decision rule:** ask yourself — "Do I already know where to look?" If yes, start with Grep. If no (unknown codebase, cross-module, many candidates), start with CodeGraph.
100
+
101
+ **Fallback:** if CodeGraph returns empty results for something that should have callers, fall back to Grep. Common reasons:
102
+ - Framework-managed entry points (HTTP routes, queue consumers, scheduled jobs) — called by the runtime, not by code
103
+ - DI / IoC containers: NestJS (`@Injectable`), Spring (`@Autowired`), Angular (`@Component`), Laravel, etc.
104
+ - Dynamic dispatch: interfaces, abstract class overrides, plugin registries
105
+
106
+ When falling back, use Grep with the symbol name and log: `[CodeGraph fallback: <reason>]`.
107
+
108
+ **Do not use CodeGraph** when `codegraphAvailable: false` was passed by the wrapper.
109
+
74
110
  ## Rules
75
111
 
76
112
  - Do NOT modify any file or generate code.
@@ -164,7 +164,15 @@ Read the `artifactLanguage` field from the JSON output. Prepend the following in
164
164
 
165
165
  Fallback rule: if the command fails, produces invalid JSON, or returns an unknown/missing `artifactLanguage` value, use `english` and continue without interruption.
166
166
 
167
- #### Step 1b: Codebase exploration
167
+ #### Step 1b: Project root resolution (MANDATORY — run before any file writes)
168
+
169
+ Run: `git rev-parse --show-toplevel`
170
+
171
+ Store the output as `<projectRoot>`. All Write tool calls MUST use this absolute path as the base: `<projectRoot>/refacil-sdd/changes/<changeName>/`
172
+
173
+ **Never use relative paths with the Write tool** — in a monorepo they resolve relative to the agent's CWD, which may be a subdirectory, not the repo root. This is the leading cause of artifacts being written to the wrong location.
174
+
175
+ #### Step 1c: Codebase exploration
168
176
 
169
177
  Before generating artifacts, explore the project so that `design.md` is realistic and not invented:
170
178
  - Read `AGENTS.md` to understand the current architecture.
@@ -175,7 +183,7 @@ Before generating artifacts, explore the project so that `design.md` is realisti
175
183
 
176
184
  Create the change directory by running: `refacil-sdd-ai sdd new-change <changeName>`
177
185
 
178
- Then generate the artifacts under `refacil-sdd/changes/<changeName>/` in this order:
186
+ Then generate the artifacts under `<projectRoot>/refacil-sdd/changes/<changeName>/` (absolute path from Step 1b) in this order:
179
187
 
180
188
  1. `proposal.md` — objective, scope, justification of the change (see template).
181
189
  2. `specs.md` — specific and testable CA-XX and CR-XX criteria (see template). If the change is complex, you may create a `specs/**/*.md` tree instead of a single `specs.md`.
@@ -224,6 +232,42 @@ Your final response MUST have this structure:
224
232
  - Emit it ALWAYS.
225
233
  - `specs` in `artefacts` must list the real paths of the generated specification files.
226
234
 
235
+ ## CodeGraph integration (optional)
236
+
237
+ If `codegraphAvailable: true` was passed by the wrapper, CodeGraph MCP tools are available:
238
+ - `codegraph_search <symbol>` — find definitions and usages of a symbol
239
+ - `codegraph_callers <symbol>` — list all callers of a function or method
240
+ - `codegraph_callees <symbol>` — list all functions called by a given function
241
+ - `codegraph_context <file>` — get focused structural context for a task or area
242
+ - `codegraph_impact <symbol>` — estimate the blast radius of a change
243
+ - `codegraph_node <symbol>` — show a symbol's source, signature, or docstring
244
+ - `codegraph_explore <query>` — deep survey of an unfamiliar module or topic (token-heavy; use once per investigation, not repeatedly)
245
+ - `codegraph_files <path>` — list files indexed under a directory path
246
+
247
+ **When to use CodeGraph — scope is unknown (fan-out is high):**
248
+ - "Who calls X?" across a large or unfamiliar codebase
249
+ - Blast radius / impact of changing a symbol
250
+ - Disambiguating a symbol that appears in many files
251
+ - Tracing a cross-module or cross-package flow you don't know yet
252
+
253
+ **When to use Grep/Read directly — scope is already bounded:**
254
+ - You already know the file(s) to look at (≤ 3–4 files)
255
+ - Simple endpoint flow: one controller → one service method (1–2 Greps find everything)
256
+ - Literal text search: log messages, config keys, string constants
257
+ - Logic is inline in a single method — callees won't add information
258
+ - Question asks about file content, not symbol relationships
259
+
260
+ **Decision rule:** ask yourself — "Do I already know where to look?" If yes, start with Grep. If no (unknown codebase, cross-module, many candidates), start with CodeGraph.
261
+
262
+ **Fallback:** if CodeGraph returns empty results for something that should have callers, fall back to Grep. Common reasons:
263
+ - Framework-managed entry points (HTTP routes, queue consumers, scheduled jobs) — called by the runtime, not by code
264
+ - DI / IoC containers: NestJS (`@Injectable`), Spring (`@Autowired`), Angular (`@Component`), Laravel, etc.
265
+ - Dynamic dispatch: interfaces, abstract class overrides, plugin registries
266
+
267
+ When falling back, use Grep with the symbol name and log: `[CodeGraph fallback: <reason>]`.
268
+
269
+ **Do not use CodeGraph** when `codegraphAvailable: false` was passed by the wrapper.
270
+
227
271
  ## Rules
228
272
 
229
273
  - Explore the codebase BEFORE generating artifacts.
package/agents/tester.md CHANGED
@@ -91,16 +91,17 @@ The wrapper passes you `targetFile` and should pass `testCommand`, `testScope`,
91
91
  4. Generate the test file following the project conventions.
92
92
  5. Run and fix until they pass (**Execution rules** below).
93
93
 
94
- ### Execution rules (mandatory — §3.1)
94
+ ### Execution rules (mandatory — §3.1, component-bounded)
95
95
 
96
- Build the shell command actually executed; record it in JSON `tests.command`. Use **`AGENTS.md`**, **`METHODOLOGY-CONTRACT.md` §3**, and **one** project config file (`package.json`, `pytest.ini`, `go.mod`, `Cargo.toml`, `pom.xml`, `.csproj`, `build.gradle.kts`, etc.) so narrowing matches the stack.
96
+ Build the shell command actually executed; record it in JSON `tests.command`.
97
97
 
98
- - **`testScope: full`** (on-demand): run the baseline `testCommand` unparsed by this agent (whole suite). Add coverage only if `runCoverage: true` then use the project’s **normal / repo-wide** coverage behavior (heavy).
98
+ **Component-bounded principle**: all execution is bounded to the affected component(s) never the whole monorepo. The component is the nearest ancestor of each changed file that has a stack manifest (§3 component principle). The test command is resolved language-agnostically at the component root and **run from that component root** (`cd <component> && <command>`). For multi-component changes, run each component in sequence.
99
+
100
+ - **`testScope: full`** (on-demand): run the full suite of each affected component by resolving the §3 baseline command at the component root (language-agnostic: `AGENTS.md` command > package-manager script > stack default). Run from that component dir. Do NOT run all monorepo packages. Add component-wide coverage only if `runCoverage: true`.
99
101
  - **`testScope: scoped` (default)**:
100
- - **After** generating or updating test artifacts in this session, invoke the baseline runner with **explicit scope only**: file paths, package paths, `-Dtest=…`, `--tests …`, `-p` / `./pkg`, or whatever that tool documents never rely on implicit full-suite discovery.
101
- - Where the stack needs a sentinel (e.g. ` -- ` between script args and forwarded paths), follow that tool’s contract.
102
- - If paths do not exist yet (edge case): use the narrowest filter the runner supports (pattern, substring, shard) derived from `filesToTest` or `targetFile`, then switch to explicit paths once files exist.
102
+ - Run `refacil-sdd-ai sdd test-scope --files <filesToTest-csv> --baseline "<testCommand>" [--stack <detectedStack if known from briefing>] --json` and use the resulting `testCommand` (already component-rooted via `cd` prefix when needed). If `fallback: true` document `fallbackReason` in the report and run the component baseline only (not the full monorepo).
103
103
  - Do **not** run the baseline with zero narrowing unless falling back per §3.1 (and then warn).
104
+ - **Re-run / fix-loop (pass-2)**: when iterating on failing tests, run **only the previously-failing test files** — not the whole component suite. Keeps fix loops fast and bounded (§3.1 rule 8).
104
105
 
105
106
  ### Coverage rules (mandatory — §3.1)
106
107
 
@@ -109,7 +110,7 @@ Build the shell command actually executed; record it in JSON `tests.command`. Us
109
110
  - **`runCoverage: true` + `testScope: full`**: after full-suite tests pass, run `coverageCommand` once as the project defines (typically global/report over the module).
110
111
  - If `coverageCommand` is null — report `coverage` N/A. If narrowing is unsupported by the tool — report N/A + WARNING (do not widen silently to repo-wide coverage while scoped).
111
112
 
112
- Working directory: module / service / repo root stated in project docs (`AGENTS.md` or config), not assumed.
113
+ Working directory: the **component root** of the affected files (resolved language-agnostically per §3 nearest ancestor with a stack manifest), not the monorepo root unless all changes are at the monorepo root.
113
114
 
114
115
  ## Generation rules
115
116
 
@@ -160,4 +161,40 @@ Working directory: module / service / repo root stated in project docs (`AGENTS.
160
161
  - Use the literal fence ` ```refacil-test-result ` (not ` ```json `).
161
162
  - Emit it ALWAYS.
162
163
  - `filesRead` lists the files read (for cost observability).
163
- - `issues` = `[]` if there are no problems. `coverage` = `null` if there is no script.
164
+ - `issues` = `[]` if there are no problems. `coverage` = `null` if there is no script.
165
+
166
+ ## CodeGraph integration (optional)
167
+
168
+ If `codegraphAvailable: true` was passed by the wrapper, CodeGraph MCP tools are available:
169
+ - `codegraph_search <symbol>` — find definitions and usages of a symbol
170
+ - `codegraph_callers <symbol>` — list all callers of a function or method
171
+ - `codegraph_callees <symbol>` — list all functions called by a given function
172
+ - `codegraph_context <file>` — get focused structural context for a task or area
173
+ - `codegraph_impact <symbol>` — estimate the blast radius of a change
174
+ - `codegraph_node <symbol>` — show a symbol's source, signature, or docstring
175
+ - `codegraph_explore <query>` — deep survey of an unfamiliar module or topic (token-heavy; use once per investigation, not repeatedly)
176
+ - `codegraph_files <path>` — list files indexed under a directory path
177
+
178
+ **When to use CodeGraph — scope is unknown (fan-out is high):**
179
+ - "Who calls X?" across a large or unfamiliar codebase
180
+ - Blast radius / impact of changing a symbol
181
+ - Disambiguating a symbol that appears in many files
182
+ - Tracing a cross-module or cross-package flow you don't know yet
183
+
184
+ **When to use Grep/Read directly — scope is already bounded:**
185
+ - You already know the file(s) to look at (≤ 3–4 files)
186
+ - Simple endpoint flow: one controller → one service method (1–2 Greps find everything)
187
+ - Literal text search: log messages, config keys, string constants
188
+ - Logic is inline in a single method — callees won't add information
189
+ - Question asks about file content, not symbol relationships
190
+
191
+ **Decision rule:** ask yourself — "Do I already know where to look?" If yes, start with Grep. If no (unknown codebase, cross-module, many candidates), start with CodeGraph.
192
+
193
+ **Fallback:** if CodeGraph returns empty results for something that should have callers, fall back to Grep. Common reasons:
194
+ - Framework-managed entry points (HTTP routes, queue consumers, scheduled jobs) — called by the runtime, not by code
195
+ - DI / IoC containers: NestJS (`@Injectable`), Spring (`@Autowired`), Angular (`@Component`), Laravel, etc.
196
+ - Dynamic dispatch: interfaces, abstract class overrides, plugin registries
197
+
198
+ When falling back, use Grep with the symbol name and log: `[CodeGraph fallback: <reason>]`.
199
+
200
+ **Do not use CodeGraph** when `codegraphAvailable: false` was passed by the wrapper.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: refacil-validator
3
- description: Validates implementation against SDD specs (CA/CR) and tests. Delegated by /refacil:verify — do not invoke directly. Never modifies files.
3
+ description: Validates implementation against SDD specs (CA/CR). Test execution is optional per briefing testExecution (§3.2). Delegated by /refacil:verify — do not invoke directly. Never modifies files.
4
4
  tools: Read, Grep, Glob, Bash
5
5
  model: sonnet
6
6
  ---
@@ -11,7 +11,7 @@ You are a validation agent. You receive a briefing with CA/CR criteria, a test c
11
11
 
12
12
  Report every CA/CR violation you find. Do not soften findings because the implementation is mostly correct. A partial pass is a fail.
13
13
 
14
- **Prerequisites**: rules from `refacil-prereqs/METHODOLOGY-CONTRACT.md` (including §3.1default scoped tests **and scoped coverage** on the change).
14
+ **Prerequisites**: rules from `refacil-prereqs/METHODOLOGY-CONTRACT.md` (including §3.2`/refacil:test` owns full test+coverage; default `testExecution: none` when test memory exists).
15
15
 
16
16
  ## Guardrail: direct invocation detection
17
17
 
@@ -36,7 +36,9 @@ If you prefer only the report (without applying fixes), respond with the explici
36
36
 
37
37
  **BEFORE reading any file or running any command, read this rule.**
38
38
 
39
- - **If the briefing includes `testCommand`**: use it directly **do not look up the command in `METHODOLOGY-CONTRACT.md`**. Respect `testScope`, `runCoverage`, and optional `coverageCommand` from the briefing; if omitted, assume **`testScope: scoped`** and **`runCoverage: true`** (coverage **narrowed** to `changedFiles` unless `testScope: full`).
39
+ - **If the briefing includes `testExecution`**: follow §3.2default **`none`** when absent but `commandsRun` is present. Do **not** run Bash tests unless `testExecution` is `full` or `smoke`.
40
+ - **If `testExecution: full`**: use `testCommand` from the briefing — **do not look up the command in `METHODOLOGY-CONTRACT.md`**. Respect `testScope`, `runCoverage`, and `coverageCommand`.
41
+ - **If `testExecution: smoke`**: run **only** `smokeTestCommand` — no coverage.
40
42
  - **If the briefing includes `criteria`**: use it for verification — **do not re-read the specs** to extract the CA/CR again.
41
43
  - **If the briefing includes `changedFiles`**: focus the 3D verification on those files — do not do a global discovery.
42
44
  - Read ONLY the specific files needed to verify each CA/CR.
@@ -56,6 +58,8 @@ Before asserting the absence of **`.review-passed`** or other dotfiles, apply **
56
58
 
57
59
  ### Step 1: Verify implementation (3D framework)
58
60
 
61
+ **Authoritative definition**: **See `METHODOLOGY-CONTRACT.md §3C — 3C Criterion: Completeness, Correctness, Coherence`** for the full definition, severity table, and graceful degradation rule. The quick reference below aligns with that section; the contract is the source of truth if there is any conflict.
62
+
59
63
  Apply the three-dimensional verification framework directly, using the briefing as the primary source:
60
64
 
61
65
  **Dimension 1 — Completeness (is everything implemented?)**
@@ -71,21 +75,32 @@ Apply the three-dimensional verification framework directly, using the briefing
71
75
  **Dimension 3 — Coherence (is it consistent with the architecture?)**
72
76
  - Verify that new files follow the patterns from the briefing's `architectureContext` (naming, structure, module conventions).
73
77
  - Verify that no files outside `scope.doNotTouch` were modified.
78
+ - If `codegraphAvailable: true` in the briefing: use `codegraph_context` or `codegraph_search` on the `changedFiles` to verify architectural coherence (call graphs, module boundaries, fan-out). CodeGraph usage is complementary — if not available, continue with direct file reading.
74
79
  - WARNING if there is a pattern deviation. SUGGESTION if there is a better alignment opportunity.
75
80
 
76
- **graceful degradation**: if the briefing does not include `criteria`, infer the criteria by reading the change specs (`refacil-sdd/changes/<changeName>/specs.md` or `specs/**/*.md`). If there are no specs either, apply only Dimension 1 (Completeness) and document the limitation as WARNING.
81
+ **graceful degradation**: if the briefing does not include `criteria`, infer the criteria by reading the change specs (`refacil-sdd/changes/<changeName>/specs.md` or `specs/**/*.md`). If there are no specs either, apply only Dimension 1 (Completeness) and document the limitation as WARNING. (See `METHODOLOGY-CONTRACT.md §3C` for the full graceful degradation rule.)
77
82
 
78
83
  Produce a list of issues with severity `CRITICAL` / `WARNING` / `SUGGESTION`.
79
84
 
80
- ### Step 2: Verify tests
85
+ ### Step 2: Verify tests (conditional — §3.2)
86
+
87
+ Read `testExecution` from the briefing (default infer: `none` if `commandsRun` present, else `full`).
88
+
89
+ **`testExecution: none`**:
90
+ - **Do not** run `testCommand`, `smokeTestCommand`, or `coverageCommand`.
91
+ - In the Tests section report: **N/A (delegated to `/refacil:test` phase)** and cite the last entry in `commandsRun` from the briefing.
92
+ - Still validate CA/CR that depend on test *artifacts* by reading test files (static), not by executing the suite.
93
+ - JSON `tests.executed: false`, `tests.delegated: true`, `tests.command` = last `commandsRun` or null.
81
94
 
82
- **If the briefing includes `testCommand`**: run **only** that command (already narrowed by the wrapper when `testScope: scoped`). Do not substitute a fuller command.
83
- **If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3, then narrow per §3.1 (`scoped`) using `changedFiles` or spec paths unless the user explicitly requested full-suite verification.
95
+ **`testExecution: smoke`**:
96
+ - Run **only** `smokeTestCommand`. Do not run `coverageCommand`.
97
+ - FAIL if smoke fails; PASS if smoke passes. Note in report that full suite/coverage requires `/refacil:test`.
84
98
 
85
- Verify:
86
- - All invoked tests pass.
87
- - Tests substantively cover acceptance criteria from the briefing (or from the spec).
88
- - **`runCoverage: true`** (briefing default unless user opted out): after tests pass, run coverage narrowed to **`changedFiles`** / touched packages when **`testScope: scoped`**; use standard repo-wide coverage when **`testScope: full`**. If `coverageCommand` is null → N/A. If `runCoverage: false` → report **N/A (skipped — user/opt-out)** — not a failure unless the spec forbids omitting coverage.
99
+ **`testExecution: full`**:
100
+ - Run `testCommand` only (already narrowed when `testScope: scoped`). Do not substitute a fuller command.
101
+ - After tests pass, apply coverage per briefing (`runCoverage`, `coverageCommand`, `testScope`) as in §3.1.
102
+
103
+ **If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3.2 and §3.1; ask user to confirm scope before running tests.
89
104
 
90
105
  ### Step 3: Validate cross-repo ambiguities (optional)
91
106
 
@@ -129,8 +144,11 @@ Required corrections (only if REQUIRES_CORRECTIONS):
129
144
  }
130
145
  ],
131
146
  "tests": {
132
- "command": "<command>",
133
- "passed": <bool>,
147
+ "executed": <bool>,
148
+ "delegated": <bool>,
149
+ "executionMode": "none" | "smoke" | "full",
150
+ "command": "<command or last commandsRun when delegated>",
151
+ "passed": <bool or null when not executed>,
134
152
  "total": <int or null>,
135
153
  "coverage": <number or null>
136
154
  }
@@ -143,6 +161,42 @@ Required corrections (only if REQUIRES_CORRECTIONS):
143
161
  - `date`: run `date -u +%Y-%m-%dT%H:%M:%SZ` via Bash.
144
162
  - `issues` = `[]` if there are no issues.
145
163
 
164
+ ## CodeGraph integration (optional)
165
+
166
+ If `codegraphAvailable: true` was passed by the wrapper, CodeGraph MCP tools are available:
167
+ - `codegraph_search <symbol>` — find definitions and usages of a symbol
168
+ - `codegraph_callers <symbol>` — list all callers of a function or method
169
+ - `codegraph_callees <symbol>` — list all functions called by a given function
170
+ - `codegraph_context <file>` — get focused structural context for a task or area
171
+ - `codegraph_impact <symbol>` — estimate the blast radius of a change
172
+ - `codegraph_node <symbol>` — show a symbol's source, signature, or docstring
173
+ - `codegraph_explore <query>` — deep survey of an unfamiliar module or topic (token-heavy; use once per investigation, not repeatedly)
174
+ - `codegraph_files <path>` — list files indexed under a directory path
175
+
176
+ **When to use CodeGraph — scope is unknown (fan-out is high):**
177
+ - "Who calls X?" across a large or unfamiliar codebase
178
+ - Blast radius / impact of changing a symbol
179
+ - Disambiguating a symbol that appears in many files
180
+ - Tracing a cross-module or cross-package flow you don't know yet
181
+
182
+ **When to use Grep/Read directly — scope is already bounded:**
183
+ - You already know the file(s) to look at (≤ 3–4 files)
184
+ - Simple endpoint flow: one controller → one service method (1–2 Greps find everything)
185
+ - Literal text search: log messages, config keys, string constants
186
+ - Logic is inline in a single method — callees won't add information
187
+ - Question asks about file content, not symbol relationships
188
+
189
+ **Decision rule:** ask yourself — "Do I already know where to look?" If yes, start with Grep. If no (unknown codebase, cross-module, many candidates), start with CodeGraph.
190
+
191
+ **Fallback:** if CodeGraph returns empty results for something that should have callers, fall back to Grep. Common reasons:
192
+ - Framework-managed entry points (HTTP routes, queue consumers, scheduled jobs) — called by the runtime, not by code
193
+ - DI / IoC containers: NestJS (`@Injectable`), Spring (`@Autowired`), Angular (`@Component`), Laravel, etc.
194
+ - Dynamic dispatch: interfaces, abstract class overrides, plugin registries
195
+
196
+ When falling back, use Grep with the symbol name and log: `[CodeGraph fallback: <reason>]`.
197
+
198
+ **Do not use CodeGraph** when `codegraphAvailable: false` was passed by the wrapper.
199
+
146
200
  ## Rules
147
201
 
148
202
  - **NEVER modify code**.