refacil-sdd-ai 5.3.2 → 5.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -29,7 +29,7 @@ If you prefer to continue here, provide:
29
29
  - mode: investigation (only analyze and propose hypotheses) or fix (implement with already-confirmed hypothesis)
30
30
  - description: <full bug description>
31
31
  - hypothesis: <confirmed root cause> (only for mode=fix)
32
- - testScope: scoped \| full (only for mode=fix; default scoped)
32
+ - testScope: scoped (mode=fix is ALWAYS scoped — rule 0; full regression is /refacil:test's job)
33
33
  ```
34
34
 
35
35
  **Do not proceed with reads or implementation until the scope is clear.**
@@ -163,7 +163,7 @@ Proposed fix for hypothesis #1:
163
163
 
164
164
  ## Fix mode
165
165
 
166
- The main agent passes you: `mode: fix` + `description` + `hypothesis` (root cause confirmed by the user) + optional **`testScope`** (`scoped` \| `full`, default **`scoped`**).
166
+ The main agent passes you: `mode: fix` + `description` + `hypothesis` (root cause confirmed by the user). Fix mode is **always scoped** (rule 0) — there is no full-suite option in `/refacil:bug`.
167
167
 
168
168
  ### Step 1: Implement the fix
169
169
 
@@ -205,14 +205,19 @@ Create `<projectRoot>/refacil-sdd/changes/<fix-name>/summary.md`:
205
205
 
206
206
  This file is mandatory for traceability and allows the `check-review` hook to detect the active change. The `.review-passed` will be created by `/refacil:review` upon approval.
207
207
 
208
- ### Step 4: Verify tests (`METHODOLOGY-CONTRACT.md` §3.1)
208
+ ### Step 4: Verify tests (`METHODOLOGY-CONTRACT.md` §3.1 — always scoped, rule 0)
209
209
 
210
- 1. Read **`testScope`** from wrapper (default **`scoped`** if omitted).
211
- 2. **`testScope: full`**: Resolve baseline from **`METHODOLOGY-CONTRACT.md §3`**, run **once unparsed** — **all tests** emitted by that command must pass.
212
- 3. **`testScope: scoped`** (default): Collect **`verificationTargets`** — every production/test file **you edited or added** during fix mode (**including** regression tests created this session).
213
- - Build **`scopedCommand`** by narrowing baseline §3 to cover only those roots (directories, `-p`/`-pl`, `--`/path suffixes — follow stack docs + **`AGENTS.md` / `.agents/testing.md`** when present — see §3.1 **Scoped command patterns**).
214
- - Run **`scopedCommand`**; everything it selects must pass. **Do not** upgrade to repo-wide invocation while `scoped` unless §3.1 says narrowing is unreliable — then run baseline **once**, prepend report line **WARN: scoped narrowing unavailable → full-suite fallback (heavy)**.
215
- 4. **`testsResult.command`** in JSON must quote the **literal** executed shell string (`scopedCommand` or baseline).
210
+ `/refacil:bug` does **not** pass through `/refacil:test`, so you validate the fix with your **own regression tests** (always in scope) — **never the full suite** (rule 0).
211
+
212
+ 1. Collect **`verificationTargets`** — every production/test file **you edited or added** during fix mode (**including** regression tests created this session).
213
+ 2. Derive the scoped command via the CLI (stack-agnostic, structurally bounded):
214
+ ```
215
+ refacil-sdd-ai sdd test-scope --files "<verificationTargets-csv>" --baseline "<§3 baseline>" --no-baseline-fallback
216
+ ```
217
+ `--no-baseline-fallback` guarantees the CLI **never** returns the full baseline: on fallback `testCommand` is **empty**.
218
+ 3. Run the returned `testCommand`; everything it selects must pass.
219
+ 4. **Fallback** (empty `testCommand` / `fallback: true`): run **only** the touched files that are themselves test files (your regression tests); if none exist, **SKIP** and prepend report line **WARN: no scopeable tests for touched files → verification deferred**. **Never** run the full repo/package baseline.
220
+ 5. **`testsResult.command`** in JSON must quote the **literal** executed shell string (or `null`/empty when SKIPPED).
216
221
 
217
222
  ### Report + JSON block (fix)
218
223
 
@@ -254,5 +259,5 @@ This file is mandatory for traceability and allows the `check-review` hook to de
254
259
  - In mode=investigation: follow diagnose loop discipline (reproduce, minimize, hypothesize, validate evidence) before proposing a fix.
255
260
  - In mode=fix: the fix must be MINIMAL. Never over-refactor.
256
261
  - Regression tests are MANDATORY in mode=fix.
257
- - **Scoped verification**: default **`testScope: scoped`** from wrapper — narrowed command in Step 4, not wholesale “run entire repo suite unless `full`.
262
+ - **Scoped verification**: **always scoped** (rule 0) — narrowed command in Step 4 via `sdd test-scope --no-baseline-fallback`, never a wholesale “run entire repo suite”. Full regression belongs to `/refacil:test` or CI.
258
263
  - Use **concise** output mode by default.
@@ -124,9 +124,9 @@ Follow **`METHODOLOGY-CONTRACT.md §3.1`**:
124
124
 
125
125
  2. **Derive a minimal scoped smoke command** (stack-agnostic — no hardcoded runners):
126
126
  ```
127
- refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>"
127
+ refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>" --no-baseline-fallback
128
128
  ```
129
- Use the resulting `testCommand` from the output.
129
+ Use the resulting `testCommand` from the output. The `--no-baseline-fallback` flag is **mandatory in apply**: on fallback the CLI returns an **empty** `testCommand` (never the full baseline), so you physically cannot run the whole suite — apply NEVER runs full regression. If `testCommand` is empty / `fallback: true`, go to step 4 (run touched test files only, else SKIP).
130
130
 
131
131
  3. **Run the resulting smoke command.**
132
132
 
@@ -36,9 +36,10 @@ If you prefer only the report (without applying fixes), respond with the explici
36
36
 
37
37
  **BEFORE reading any file or running any command, read this rule.**
38
38
 
39
- - **If the briefing includes `testExecution`**: follow §3.2 — default **`none`** when absent but `commandsRun` is present. Do **not** run Bash tests unless `testExecution` is `full` or `smoke`.
40
- - **If `testExecution: full`**: use `testCommand` from the briefing **do not look up the command in `METHODOLOGY-CONTRACT.md`**. Respect `testScope`, `runCoverage`, and `coverageCommand`.
41
- - **If `testExecution: smoke`**: run **only** `smokeTestCommand` — no coverage.
39
+ - **If the briefing includes `testExecution`**: follow §3.2 — default **`none`** when absent but `commandsRun` is present. Do **not** run Bash tests unless `testExecution` is `smoke`.
40
+ - **Verify NEVER runs the full suite** (`METHODOLOGY-CONTRACT.md §3.1` rule 0): `/refacil:test` is the mandatory prior step that owns full regression. There is no `full` mode here.
41
+ - **If `testExecution: defer`**: do **not** run any tests there is no current test evidence (CR-01) or the user asked to re-run the suite. Report tests as **N/A (pending `/refacil:test`)** and tell the user to run `/refacil:test` first.
42
+ - **If `testExecution: smoke`**: run **only** `smokeTestCommand` (already scoped via `--no-baseline-fallback`; if empty, SKIP) — no coverage, never the baseline.
42
43
  - **If the briefing includes `criteria`**: use it for verification — **do not re-read the specs** to extract the CA/CR again.
43
44
  - **If the briefing includes `changedFiles`**: focus the 3D verification on those files — do not do a global discovery.
44
45
  - Read ONLY the specific files needed to verify each CA/CR.
@@ -84,7 +85,7 @@ Produce a list of issues with severity `CRITICAL` / `WARNING` / `SUGGESTION`.
84
85
 
85
86
  ### Step 2: Verify tests (conditional — §3.2)
86
87
 
87
- Read `testExecution` from the briefing (default infer: `none` if `commandsRun` present, else `full`).
88
+ Read `testExecution` from the briefing (default infer: `none` if `commandsRun` present, else `defer` — never `full`; verify does not run the suite, rule 0).
88
89
 
89
90
  **`testExecution: none`**:
90
91
  - **Do not** run `testCommand`, `smokeTestCommand`, or `coverageCommand`.
@@ -93,14 +94,14 @@ Read `testExecution` from the briefing (default infer: `none` if `commandsRun` p
93
94
  - JSON `tests.executed: false`, `tests.delegated: true`, `tests.command` = last `commandsRun` or null.
94
95
 
95
96
  **`testExecution: smoke`**:
96
- - Run **only** `smokeTestCommand`. Do not run `coverageCommand`.
97
+ - Run **only** `smokeTestCommand` (scoped via `--no-baseline-fallback`; if empty/`fallback`, SKIP only the touched test files). Do not run `coverageCommand`. **Never** run the baseline.
97
98
  - FAIL if smoke fails; PASS if smoke passes. Note in report that full suite/coverage requires `/refacil:test`.
98
99
 
99
- **`testExecution: full`**:
100
- - Run `testCommand` only (already narrowed when `testScope: scoped`). Do not substitute a fuller command.
101
- - After tests pass, apply coverage per briefing (`runCoverage`, `coverageCommand`, `testScope`) as in §3.1.
100
+ **`testExecution: defer`** (CR-01 — no test evidence, or user asked to re-run the suite):
101
+ - **Do not run any tests.** Verify never runs the full suite (rule 0); `/refacil:test` is the mandatory prior step.
102
+ - Report tests as **N/A (pending `/refacil:test`)**, set `tests.executed: false`, `tests.delegated: true`, and tell the user to run `/refacil:test` before verify.
102
103
 
103
- **If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3.2 and §3.1; ask user to confirm scope before running tests.
104
+ **If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3.2 and §3.1; default to `none`/`defer` and ask the user to run `/refacil:test` rather than running the suite here.
104
105
 
105
106
  ### Step 3: Validate cross-repo ambiguities (optional)
106
107
 
@@ -146,7 +147,7 @@ Required corrections (only if REQUIRES_CORRECTIONS):
146
147
  "tests": {
147
148
  "executed": <bool>,
148
149
  "delegated": <bool>,
149
- "executionMode": "none" | "smoke" | "full",
150
+ "executionMode": "none" | "smoke" | "defer",
150
151
  "command": "<command or last commandsRun when delegated>",
151
152
  "passed": <bool or null when not executed>,
152
153
  "total": <int or null>,
@@ -1021,6 +1021,10 @@ function cmdTestScope(argv, projectRoot) {
1021
1021
  const filesRaw = args.files || '';
1022
1022
  const stackHint = args.stack || undefined;
1023
1023
  const baselineCmd = args.baseline || '';
1024
+ // When set, fallback returns an EMPTY testCommand instead of the full baseline.
1025
+ // Used by /refacil:apply so the implementer can never run the whole suite even if
1026
+ // it only looks at the CLI output and ignores the contract's SKIP rule.
1027
+ const noBaselineFallback = args['no-baseline-fallback'] === true;
1024
1028
  // Use the already-resolved projectRoot from handleSdd (via findProjectRoot()) so
1025
1029
  // the CLI works correctly when invoked from a subdirectory within the monorepo.
1026
1030
  const root = projectRoot || process.cwd();
@@ -1031,7 +1035,7 @@ function cmdTestScope(argv, projectRoot) {
1031
1035
  : [];
1032
1036
 
1033
1037
  const { testScope } = require('../test-scope');
1034
- const result = testScope({ files, stack: stackHint, baseline: baselineCmd, projectRoot: root });
1038
+ const result = testScope({ files, stack: stackHint, baseline: baselineCmd, projectRoot: root, noBaselineFallback });
1035
1039
 
1036
1040
  if (wantJson) {
1037
1041
  process.stdout.write(JSON.stringify(result) + '\n');
@@ -1296,6 +1300,9 @@ function sddHelp() {
1296
1300
  --files <csv> Comma-separated source file paths to scope tests for
1297
1301
  [--stack <name>] Stack hint (node, python, go, rust, java, dotnet)
1298
1302
  [--baseline <cmd>] Fallback test command when no test files are found
1303
+ [--no-baseline-fallback] On fallback, return an EMPTY testCommand instead of the
1304
+ baseline (used by /refacil:apply so it can never run the
1305
+ whole suite). /refacil:test omits this flag.
1299
1306
  [--json] Output result as JSON (testCommand, files, fallback, fallbackReason)
1300
1307
  Always exits 0.
1301
1308
 
package/lib/spec-sync.js CHANGED
@@ -98,7 +98,13 @@ function parseCriteriaBlocks(markdown) {
98
98
  };
99
99
 
100
100
  for (const line of lines) {
101
- const m = line.match(/^##\s+((?:CA|CR)-\d+):\s*(.+)$/i);
101
+ // Tolerant heading match so archive's spec-sync never trips on cosmetic
102
+ // format variance the proposer/agents produce in the wild:
103
+ // - heading level: h2 (## CA-01) OR h3+ (### CA-A01) — level is ignored
104
+ // - criterion id: numeric (CA-01), feature-prefixed (CA-A01, CA-G01) or
105
+ // suffixed (CA-12b) — any [A-Za-z0-9] run after the CA-/CR- prefix
106
+ // The CA-/CR- token is the real signal; the heading level and id shape are not.
107
+ const m = line.match(/^#{2,6}\s+((?:CA|CR)-[A-Za-z0-9]+):\s*(.+)$/i);
102
108
  if (m) {
103
109
  pushCurrent();
104
110
  current = { id: m[1].toUpperCase(), title: m[2].trim(), lines: [] };
package/lib/test-scope.js CHANGED
@@ -27,6 +27,40 @@ const path = require('path');
27
27
 
28
28
  const KNOWN_STACKS = ['node', 'python', 'go', 'rust', 'java', 'dotnet'];
29
29
 
30
+ // ---------------------------------------------------------------------------
31
+ // Source-code file extensions per stack.
32
+ //
33
+ // A changed file whose extension is NOT a code extension for its stack cannot
34
+ // have a unit-test mapping (e.g. a skill/agent `.md` doc in a Node repo). Such
35
+ // files must be skipped during scoping: otherwise the loose basename match in
36
+ // findTestFilesByImport produces false positives against test files that merely
37
+ // MENTION the file's name as a string. Test files are themselves code, so this
38
+ // guard never drops a real test.
39
+ // ---------------------------------------------------------------------------
40
+
41
+ const CODE_EXTENSIONS = {
42
+ node: ['.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs'],
43
+ python: ['.py'],
44
+ go: ['.go'],
45
+ rust: ['.rs'],
46
+ java: ['.java', '.kt', '.kts'],
47
+ dotnet: ['.cs'],
48
+ };
49
+
50
+ /**
51
+ * Returns true if the file's extension is a recognized source-code extension
52
+ * for the given stack. Unknown stacks return false.
53
+ *
54
+ * @param {string} filePath
55
+ * @param {string} stack
56
+ * @returns {boolean}
57
+ */
58
+ function isCodeFileForStack(filePath, stack) {
59
+ const exts = CODE_EXTENSIONS[stack];
60
+ if (!exts) return false;
61
+ return exts.includes(path.extname(filePath).toLowerCase());
62
+ }
63
+
30
64
  // ---------------------------------------------------------------------------
31
65
  // Planning-only file patterns — never justify a test run on their own.
32
66
  // ---------------------------------------------------------------------------
@@ -489,16 +523,24 @@ function buildScopedCommand(absTestFiles, detectedStack, moduleRoot, projectRoot
489
523
  * @param {string} opts.stack - stack hint (optional; auto-detected if omitted)
490
524
  * @param {string} opts.baseline - fallback test command (optional)
491
525
  * @param {string} opts.projectRoot - project root (optional; uses cwd if omitted)
526
+ * @param {boolean} opts.noBaselineFallback - when true, fallback returns an EMPTY
527
+ * testCommand instead of the full baseline. Used by phases that must NEVER run the
528
+ * whole suite (e.g. `/refacil:apply` smoke): the consumer physically never receives
529
+ * a full-suite command, so it cannot run it even if it ignores the contract rule.
530
+ * `/refacil:test` (which legitimately runs the full suite on fallback) omits this.
492
531
  * @returns {{ testCommand: string, files: string[], fallback: boolean, fallbackReason: string|null }}
493
532
  */
494
- function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
533
+ function testScope({ files = [], stack, baseline = '', projectRoot, noBaselineFallback = false } = {}) {
495
534
  const root = projectRoot || process.cwd();
496
535
  const base = baseline || '';
536
+ // What to emit as testCommand on every fallback path. In normal mode this is the
537
+ // baseline; in noBaselineFallback mode it is empty so apply can never run the suite.
538
+ const fallbackCommand = noBaselineFallback ? '' : base;
497
539
 
498
540
  // Fallback: empty files input
499
541
  if (!files || files.length === 0) {
500
542
  return {
501
- testCommand: base,
543
+ testCommand: fallbackCommand,
502
544
  files: [],
503
545
  fallback: true,
504
546
  fallbackReason: 'No source files provided — falling back to baseline.',
@@ -509,7 +551,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
509
551
  const sourceFiles = files.filter((f) => !isPlanningFile(f));
510
552
  if (sourceFiles.length === 0) {
511
553
  return {
512
- testCommand: base,
554
+ testCommand: fallbackCommand,
513
555
  files: [],
514
556
  fallback: true,
515
557
  fallbackReason: 'All provided files are planning-only (markdown/SDD artifacts) — falling back to baseline.',
@@ -523,7 +565,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
523
565
  const stackHintUnknown = stack && !KNOWN_STACKS.includes(stack);
524
566
  if (stackHintUnknown) {
525
567
  return {
526
- testCommand: base,
568
+ testCommand: fallbackCommand,
527
569
  files: [],
528
570
  fallback: true,
529
571
  fallbackReason: 'Stack could not be determined — falling back to baseline.',
@@ -556,6 +598,13 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
556
598
  continue;
557
599
  }
558
600
 
601
+ // Skip non-code files (e.g. markdown skill/agent docs in a Node repo): they
602
+ // cannot map to a unit test, and a loose basename match would produce false
603
+ // positives against tests that merely mention the file's name as a string.
604
+ if (!isCodeFileForStack(absSource, fileStack)) {
605
+ continue;
606
+ }
607
+
559
608
  if (!byModule.has(moduleRoot)) {
560
609
  byModule.set(moduleRoot, { stack: fileStack, testFiles: new Set() });
561
610
  }
@@ -598,7 +647,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
598
647
 
599
648
  if (!anyKnownStack && !stackHintValid) {
600
649
  return {
601
- testCommand: base,
650
+ testCommand: fallbackCommand,
602
651
  files: [],
603
652
  fallback: true,
604
653
  fallbackReason: 'Stack could not be determined — falling back to baseline.',
@@ -606,7 +655,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
606
655
  }
607
656
 
608
657
  return {
609
- testCommand: base,
658
+ testCommand: fallbackCommand,
610
659
  files: [],
611
660
  fallback: true,
612
661
  fallbackReason: 'No test files found for the given source files — falling back to baseline.',
@@ -710,4 +759,5 @@ module.exports = {
710
759
  findModuleRoot,
711
760
  isTestFile,
712
761
  affectedComponents,
762
+ isCodeFileForStack,
713
763
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "refacil-sdd-ai",
3
- "version": "5.3.2",
3
+ "version": "5.3.4",
4
4
  "description": "SDD-AI: Specification-Driven Development with AI — development methodology using AI with Claude Code, Cursor, OpenCode and Codex",
5
5
  "bin": {
6
6
  "refacil-sdd-ai": "./bin/cli.js"
@@ -134,19 +134,18 @@ Apply-specific annotation: use `fix/<ID>` as the branch prefix (e.g. `fix/SEGINF
134
134
 
135
135
  ### Step 5: Delegate implementation to the refacil-debugger sub-agent (mode: fix)
136
136
 
137
- **`testScope` for fix mode** default **`scoped`**. Parse `$ARGUMENTS` **and** the user message invoking this skill for whole-repo regression (same tokens as apply: **`full`**, **`all tests`**, **`whole suite`**, **`suite completa`**). Pass **`testScope: full`** only when explicitly requested.
137
+ **Test execution for fix mode is ALWAYS scoped** (`METHODOLOGY-CONTRACT.md §3.1` rule 0). `/refacil:bug` does not pass through `/refacil:test`, so the debugger validates the fix with its own regression tests + touched files — **never the full suite**. If the user explicitly asks for whole-repo regression (`full`, `suite completa`, …), do **not** widen this phase: tell them to run `/refacil:test full` or rely on CI after the fix.
138
138
 
139
139
  Invoke the `refacil-debugger` sub-agent passing it:
140
140
  - `mode: fix`
141
141
  - `description`: complete bug description.
142
142
  - `hypothesis`: root cause confirmed by the user in Step 3.
143
- - `testScope`: `scoped` \| `full` — from the rule above (default **`scoped`**).
144
143
 
145
144
  The sub-agent:
146
145
  - Implements the minimal and focused fix.
147
146
  - Generates regression tests (reproduces the bug + verifies the fix + normal-path assertions where warranted).
148
147
  - Creates `refacil-sdd/changes/fix-<name>/summary.md` with traceability. This `fix-*` folder is the approved operational exception to the regular proposal artifacts: it does not need `proposal.md`, `design.md`, `tasks.md`, or `specs.md` to be archived later, but it must include a substantive `summary.md`, regression test evidence from the debugger run, and an approved review before archive.
149
- - Runs **`testCommand`** per **`METHODOLOGY-CONTRACT.md §3.1`** (narrowed when `scoped`; full baseline only when `full` or narrow fallback warns).
148
+ - Runs a **scoped `testCommand`** derived via `sdd test-scope --no-baseline-fallback` (`METHODOLOGY-CONTRACT.md §3.1` rule 0). On fallback it runs only touched test files, else SKIPs — **never** the full baseline.
150
149
  - Returns the report fenced as ` ```refacil-debug-fix `.
151
150
 
152
151
  ### Step 6: Present result and next step
@@ -49,13 +49,15 @@ Coverage (if applicable): detect the project command at the component root (`tes
49
49
 
50
50
  **Rules**
51
51
 
52
+ 0. **Only `/refacil:test` may run the full/component baseline** (once per cycle, component-bounded). **Every other phase that executes tests** — `/refacil:apply` (implementer smoke), `/refacil:bug` (debugger fix), `/refacil:verify` (smoke after corrections) — derives its command via `refacil-sdd-ai sdd test-scope … --no-baseline-fallback`, which returns an **empty `testCommand` on fallback** (never the baseline). On fallback those phases run **only the touched files that are themselves test files**; if none, they **SKIP** and defer to `/refacil:test`. The "unreliable scope → run baseline once" escape (Scoped command patterns, below) applies **exclusively to `/refacil:test`**. This is a **structural** guarantee, not just a convention: the CLI never hands a full-suite command to a non-test phase, so an agent that only reads the CLI output cannot run the whole suite.
53
+
52
54
  1. **`testScope: scoped`** (default): sub-agents run tests **only** for artifacts tied to the current change — never invoke the §3 baseline in **full-repo / full-suite** form without narrowing (paths, packages, filters, patterns), except the explicit fallbacks below.
53
55
  2. **`testScope: full`**: **on-demand only** — user explicitly requests whole-suite regression in **`/refacil:test`** (or `/refacil:verify`) arguments (e.g. `full`, `all tests`, `whole suite`, `suite completa`). Resolve the §3 baseline command language-agnostically at each **affected component root** and run it from that component dir (`cd <component> && <baseline>`). Never run all monorepo packages — only the component(s) whose files changed. If multiple components are affected, run each in sequence. Coverage = component-wide (not repo-wide).
54
56
  3. **`runCoverage: true`** (default): after scoped tests pass, run coverage **narrowed to the change** — instrument/collect only for **`filesToTest`**, **`changedFiles`**, and companion test/spec paths tied to those modules (examples: `--cov=pkg/sub`, Jest `--collectCoverageFrom` globs limited to touched trees, Gradle/JaCoCo scoped modules). If the toolchain cannot narrow, report **N/A** plus a WARNING; do **not** silently widen to repo-wide coverage while `testScope` remains `scoped`.
55
57
  4. **`runCoverage: false`**: skip coverage entirely — only when the user **explicitly** opts out (`no coverage`, `nocoverage`, `skip coverage`, `sin cobertura`, etc.) or the project defines **no** coverage command under §3.
56
58
  5. **`runCoverage: true` + `testScope: full`**: run the project coverage command **after** the full suite passes, using the repo’s usual global/module coverage behavior (heavy — intended only when the user requested `full`).
57
- 6. **`/refacil:apply` / implementer**: the apply wrapper supplies `testScope` (default `scoped`) and **`testBaselineCommand`**. After editing, the implementer runs `refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>"` and executes the returned smoke command. The implementer **NEVER runs the full repo/package baseline** as the apply verification step — the "unreliable scope → run baseline once" escape hatch in §3.1 Scoped command patterns does NOT apply here. Fallback behaviour: if `test-scope` returns `fallback: true`, fails, or there are no touched files, run only the touched files that are themselves test files; if none exist, **SKIP verification** and add a LOW `issues` entry deferring to `/refacil:test`. Also applies: the wrapper must not precompute a stale narrowed command. Coverage is optional in that step unless the briefing adds an explicit coverage command (unusual; defer to `/refacil:test`).
58
- 7. **`/refacil:bug` / debugger `mode=fix`**: debugger defaults to **`scoped`**, narrows §3 baseline to **`filesModified` ∪ new/updated regression test files** unless the wrapper passed **`testScope: full`**.
59
+ 6. **`/refacil:apply` / implementer**: the apply wrapper supplies `testScope` (default `scoped`) and **`testBaselineCommand`**. After editing, the implementer runs `refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>" --no-baseline-fallback` and executes the returned smoke command. The **`--no-baseline-fallback`** flag is the **structural guarantee** that apply never runs the whole suite: on fallback the CLI returns an **empty `testCommand`** (not the baseline), so even an agent that only reads the CLI output cannot run full regression. The implementer **NEVER runs the full repo/package baseline** as the apply verification step — the "unreliable scope → run baseline once" escape hatch in §3.1 Scoped command patterns does NOT apply here. Fallback behaviour: if `test-scope` returns `fallback: true`, fails, or there are no touched files, run only the touched files that are themselves test files; if none exist, **SKIP verification** and add a LOW `issues` entry deferring to `/refacil:test`. Also applies: the wrapper must not precompute a stale narrowed command. Coverage is optional in that step unless the briefing adds an explicit coverage command (unusual; defer to `/refacil:test`).
60
+ 7. **`/refacil:bug` / debugger `mode=fix`**: debugger is **always `scoped`** it derives its command via `refacil-sdd-ai sdd test-scope --files "<filesModified ∪ new/updated regression test files>" --baseline "<§3 baseline>" --no-baseline-fallback` and runs the returned command. Because `/refacil:bug` does **not** pass through `/refacil:test`, it validates the fix with its **own regression tests** (which are touched test files → always in scope), never the full suite. On fallback (empty `testCommand`): run only the touched test files; if none, **SKIP** and add a LOW issue deferring to `/refacil:test`. The debugger **never** runs the full repo/package baseline.
59
61
  8. **Re-run / fix-loop (pass-2)**: when iterating on failing tests, run **only the previously-failing test files** — not the entire component suite. This keeps fix loops fast and bounded.
60
62
 
61
63
  **Scoped command patterns** (language-agnostic — sub-agent reads `AGENTS.md`, build config, and tool docs; run from the correct module/root):
@@ -63,9 +65,9 @@ Coverage (if applicable): detect the project command at the component root (`tes
63
65
  - Pass **explicit test paths**, **packages**, **classes**, or **filters** accepted by that stack (examples: Maven ` -Dtest=…`, Gradle `--tests …`, pytest file paths, `go test ./pkg/…`, `cargo test -p pkg`, .NET solution filter, Ruby `bundle exec rspec path`, JS package scripts with paths after `--`).
64
66
  - Prefer files **produced or updated in this session**; until they exist, use the narrowest supported pattern (basename, substring, regex) derived from `filesToTest` / `changedFiles`, per runner docs.
65
67
  - **Scoped coverage**: combine the same narrowing with coverage flags/includes that limit **report collection** to touched sources (runner-specific); exclude unrelated packages by default when `testScope: scoped`.
66
- - **Unreliable scope**: if narrowing cannot be done safely, run the baseline §3 command **once**, report a brief WARNING that the run may be heavy, and suggest CI or **`/refacil:test ... full`** for full regression.
68
+ - **Unreliable scope** (**`/refacil:test` only** — see rule 0): if narrowing cannot be done safely, `/refacil:test` may run the baseline §3 command **once**, report a brief WARNING that the run may be heavy, and suggest CI or **`/refacil:test ... full`** for full regression. **Non-test phases never take this escape** — they use `--no-baseline-fallback` and SKIP instead.
67
69
 
68
- **Verify (when `testExecution: full`)**: Prefer `commandsRun` from `get-memory` as reference only when re-running; else derive scoped targets from `changedFiles` and/or `git diff --name-only`, using **project test naming and layout** (`AGENTS.md`, test config): e.g. co-located `*Spec.*` / `*Test.*`, `tests/`, language-specific suffices — not a fixed extension.
70
+ **Verify (test evidence)**: verify reads `commandsRun` from `get-memory` as the source of truth for the test run (produced by `/refacil:test`). Verify itself only ever runs a **smoke** (scoped via `--no-baseline-fallback`) after corrections, deriving targets from `correctionTouchedFiles` using **project test naming and layout** (`AGENTS.md`, test config): co-located `*Spec.*` / `*Test.*`, `tests/`, language-specific suffixes. If there is no test evidence, verify **defers to `/refacil:test`** (rule 0) it does not run the suite itself.
69
71
 
70
72
  ### §3.2 — Phase ownership (test execution)
71
73
 
@@ -84,12 +86,12 @@ Coverage (if applicable): detect the project command at the component root (`tes
84
86
  | Value | When | Validator behavior |
85
87
  |-------|------|-------------------|
86
88
  | `none` | Default if `memory.lastStep` is `test` (or later) and `commandsRun` is non-empty; user did not force re-run | **Do not** run `testCommand` or `coverageCommand`. Tests section = **delegated to test phase**; cite last `commandsRun`. |
87
- | `smoke` | After surgical corrections in verify Step 5 (or rare review fix) | Run **only** companion test files for `correctionTouchedFiles`. **No** `coverageCommand`. |
88
- | `full` | User tokens (`full`, `re-run tests`, `run tests`, …) **or** CR-01 (no test memory) | Same as §3.1: `testCommand` + optional narrowed/full coverage per `testScope` / `runCoverage`. |
89
+ | `smoke` | After surgical corrections in verify Step 5 (or rare review fix) | Run **only** companion test files for `correctionTouchedFiles`, derived via `sdd test-scope … --no-baseline-fallback` (empty → SKIP). **No** `coverageCommand`. **Never** the full suite. |
90
+ | `defer` | CR-01 (no test memory) **or** user asks to re-run the suite in verify | **Verify never runs the full suite** (rule 0; `/refacil:test` is the mandatory prior step). STOP and tell the user: *"No current test evidence run `/refacil:test` before verify."* Report tests as **N/A (pending `/refacil:test`)**. |
89
91
 
90
92
  **Smoke definition**: the smallest test invocation that exercises files touched by a **correction** (not the whole change). Derive companion paths from project layout (`*Spec*`, `*Test*`, `tests/`, etc.). Smoke **does not** satisfy coverage gates or replace `/refacil:test`.
91
93
 
92
- **After corrections** (verify Step 5 or review Step 3.5): prefer `testExecution: none` + tell the user to run **`/refacil:test`** before the next full verify; or `smoke` once on correction files. **Never** use `full` in autofix re-verify unless the user explicitly requested it in the same invocation.
94
+ **After corrections** (verify Step 5 or review Step 3.5): prefer `testExecution: none` + tell the user to run **`/refacil:test`** before the next full verify; or `smoke` once on correction files (scoped via `--no-baseline-fallback`). **Never** run the full suite in autofix re-verify if full regression is wanted, it is `/refacil:test`'s job (rule 0).
93
95
 
94
96
  **Review checklist “tests pass”**: PASS when test files exist for the diff, `memory.criteriaRun` covers relevant CA/CR, and static review finds no obvious breakage — **without** running the §3 baseline via Bash unless the user explicitly asked.
95
97
 
@@ -25,12 +25,9 @@ Determine the scope before invoking the sub-agent. Prioritize in this order:
25
25
 
26
26
  - **Default**: `testExecution: none` when `get-memory` has `commandsRun` and `lastStep` is `test` (or later) — verify validates CA/CR **without** re-running the test pipeline.
27
27
 
28
- - **`testExecution: full`** if the user explicitly asked to re-run tests (`full`, `all tests`, `re-run`, `run tests`, `ejecutar tests`, `whole suite`, `suite completa`, `todas`) then also set `testScope` / `runCoverage` like **`/refacil:test`**:
29
- - **`testScope: full`** for whole-suite tokens above.
30
- - **`runCoverage: false`** for `no coverage`, `nocoverage`, `skip coverage`, `sin cobertura`, `quick`, `solo tests`.
31
- - **`full` + `no coverage`**: `testScope: full`, `runCoverage: false`.
28
+ - **`testExecution: defer`** if the user explicitly asked to re-run the suite (`full`, `all tests`, `re-run`, `run tests`, `ejecutar tests`, `whole suite`, `suite completa`, `todas`): verify does **not** run the suite (rule 0 — that is `/refacil:test`'s job). Tell the user to run `/refacil:test … full` and continue verify with the resulting evidence.
32
29
 
33
- - **No test memory** (`commandsRun` empty): emit WARNING, set `testExecution: full` (CR-01) unless only `changedFiles` allow a minimal scoped run.
30
+ - **No test memory** (`commandsRun` empty): emit WARNING and set `testExecution: defer` (CR-01) `/refacil:test` is the mandatory prior step; verify reports tests as N/A pending `/refacil:test` instead of running them itself.
34
31
 
35
32
  Do not invoke the sub-agent with ambiguous scope.
36
33
 
@@ -53,17 +50,16 @@ Before invoking the sub-agent, extract the context that the validator would othe
53
50
 
54
51
  2. **Cross-skill memory** — when `changeName` is known, run `refacil-sdd-ai sdd get-memory <changeName> --json`. Parse `commandsRun`, `criteriaRun`, and `lastStep`. If the output is `{}` or the command fails, omit those fields — do not block verification (CR-04).
55
52
 
56
- 3. **Resolve `testExecution`** (§3.2) from Step 0 and memory:
57
- - User forced re-run → `testExecution: full`.
58
- - `commandsRun` non-empty and `lastStep` is `test` (or `verify`/`review` after test) and user did **not** force re-run → `testExecution: none`.
59
- - Otherwise → `testExecution: full` with WARNING (no test phase recorded).
53
+ 3. **Resolve `testExecution`** (§3.2 verify **never** runs the full suite, rule 0):
54
+ - `commandsRun` non-empty and `lastStep` is `test` (or `verify`/`review` after test) and user did **not** request a re-run → `testExecution: none`.
55
+ - Re-verify after Step 5 corrections → `testExecution: smoke`.
56
+ - No test evidence (CR-01), or the user asked to re-run the suite → `testExecution: defer` (verify stops and tells the user to run `/refacil:test` first — it does **not** run the suite itself).
60
57
 
61
- 4. **Test commands** — only when `testExecution` is `full` or `smoke`:
62
- - **`full`**: follow §3.1 set `testScope` and `runCoverage` from Step 0; build `testCommand` (scoped from `changedFiles` or baseline if `full`); set `coverageCommand` when `runCoverage: true`.
63
- - **`smoke`**: build `smokeTestCommand` for companion tests of `correctionTouchedFiles` only; `runCoverage: false`, `coverageCommand: null`.
64
- - **`none`**: omit `testCommand` and `coverageCommand`; set `testsDelegatedFrom: test` and include `commandsRun` for the report.
58
+ 4. **Test commands** — only when `testExecution: smoke`:
59
+ - **`smoke`**: build `smokeTestCommand` for companion tests of `correctionTouchedFiles` **by calling** `refacil-sdd-ai sdd test-scope --files "<correctionTouchedFiles-csv>" --baseline "<§3 baseline>" --no-baseline-fallback` (structurally bounded empty `testCommand` on fallback). `runCoverage: false`, `coverageCommand: null`.
60
+ - **`none`** / **`defer`**: omit `testCommand`/`smokeTestCommand` and `coverageCommand`; for `none` set `testsDelegatedFrom: test` and include `commandsRun`; for `defer` instruct the validator to report N/A pending `/refacil:test`.
65
61
 
66
- 5. **Coverage command** — only when `testExecution: full` and `runCoverage: true`; otherwise `coverageCommand: null`.
62
+ 5. **Coverage command** — verify never runs coverage (that is `/refacil:test`'s job); always `coverageCommand: null`.
67
63
 
68
64
  6. **CA/CR criteria** — if there is an active change, read the specification in `refacil-sdd/changes/<changeName>/`:
69
65
  - `specs.md` if it exists, and/or files under `specs/` (recursively).
@@ -75,12 +71,11 @@ Build the BRIEFING block:
75
71
  ```
76
72
  BRIEFING:
77
73
  changeName: <name or null if scope=git-diff>
78
- testExecution: none | smoke | full
79
- testCommand: <required when full; omit when none>
80
- smokeTestCommand: <required when smoke; omit otherwise>
81
- testScope: scoped | full
82
- runCoverage: true | false
83
- coverageCommand: <project coverage entrypoint or null when full+runCoverage>
74
+ testExecution: none | smoke | defer # never full — verify does not run the suite (rule 0)
75
+ smokeTestCommand: <required when smoke (scoped via --no-baseline-fallback); omit otherwise>
76
+ testScope: scoped
77
+ runCoverage: false
78
+ coverageCommand: null # verify never runs coverage
84
79
  testsDelegatedFrom: test | null
85
80
  correctionTouchedFiles: [...] # only on re-verify after Step 5 corrections
86
81
  criteria:
@@ -102,7 +97,7 @@ Invoke `refacil-validator` passing it the BRIEFING from the previous step.
102
97
 
103
98
  The sub-agent:
104
99
  - Applies **`testExecution`** from the briefing (§3.2) — **does not** run tests when `none`.
105
- - When `full`, uses `testCommand` / coverage per §3.1; when `smoke`, runs only `smokeTestCommand` (no coverage).
100
+ - When `smoke`, runs only `smokeTestCommand` (scoped via `--no-baseline-fallback`, no coverage); when `defer`, runs no tests and reports N/A pending `/refacil:test`. Verify never runs the full suite (rule 0).
106
101
  - Uses `criteria` from the briefing for verification (without re-reading specs from scratch).
107
102
  - Uses `changedFiles` to focus the 3D verification on those files.
108
103
  - Applies the **3D framework (Completeness/Correctness/Coherence)** per **`METHODOLOGY-CONTRACT.md §3C — 3C Criterion`** — including the severity table and graceful degradation rule.
@@ -192,7 +187,7 @@ If the command fails, continue silently — it must not block the flow.
192
187
  ```
193
188
  Corrections applied. Run /refacil:test before the next full verify to refresh the test suite.
194
189
  ```
195
- **Never** set `testExecution: full` in autofix re-verify unless the user explicitly requested re-run in this invocation.
190
+ **Never** run the full suite in autofix re-verify only `smoke` (scoped) or `none`/`defer` (rule 0).
196
191
  5. Maximum **2 rounds** of automatic correction. If issues persist, list them for manual correction.
197
192
 
198
193
  **If the user does not accept:** list the issues for manual correction. Suggest `/refacil:test` then `/refacil:verify`.
@@ -200,7 +195,7 @@ If the command fails, continue silently — it must not block the flow.
200
195
  ## Rules
201
196
 
202
197
  - **Always build the briefing (Step 1) before delegating** — reduces the sub-agent tool calls.
203
- - **Defaults**: `testExecution: none` when test memory exists; **`testExecution: full`** only when Step 0 forces re-run or CR-01 applies. Smoke only after corrections; never full suite in autofix rounds.
198
+ - **Defaults**: `testExecution: none` when test memory exists; `smoke` (scoped) after corrections; `defer` when there is no evidence or the user wants a re-run (verify never runs the full suite rule 0). `/refacil:test` owns full regression.
204
199
  - **Always delegate to the sub-agent** for the analysis. Do not replicate spec reading or test execution logic here.
205
200
  - **Dotfiles in `refacil-sdd/changes/`**: never assert absence of `.review-passed` without `-a`; see §8.
206
201
  - **Corrections are ONLY applied by this wrapper** (Step 5), after explicit approval.
@@ -2,6 +2,7 @@
2
2
 
3
3
  These rules align with **`METHODOLOGY-CONTRACT.md` §3–§3.1** shipped with SDD-AI (`refacil-prereqs` in your skills install). **Concrete baseline and narrowed commands for this repo** belong in markdown **below** this marked block (not between the `refacil-sdd-ai:testing-policy` markers) so `check-update` can refresh policy text without erasing your commands.
4
4
 
5
- - **Default: scoped runs** — For `/refacil:apply`, `/refacil:bug` (fix), `/refacil:test`, and `/refacil:verify`, narrow the test runner to **packages/paths/modules touched by the change** (paths after `--`, `-p`/`-pl`, `-Dtest=…`, `pytest` paths, `go test ./…`, workspace filters, etc.). Prefer the **smallest** scope that still covers the diff. Avoid monorepo root commands that fan out to **all** workspaces unless the change truly spans them.
6
- - **Full suite** Only when the developer **explicitly** asks (`full`, `whole suite`, `suite completa`, …), in **CI / pre-merge**, or when narrowing is **unsafe** (then run baseline **once** with a clear WARN). Full runs cost more CPU/RAM.
5
+ - **Only `/refacil:test` runs the full suite** — and only the **affected component's** suite, once per cycle. Every other phase that runs tests (`/refacil:apply`, `/refacil:bug` fix, `/refacil:verify` smoke) does **scoped runs** only — it narrows the runner to **packages/paths/modules touched by the change** (paths after `--`, `-p`/`-pl`, `-Dtest=…`, `pytest` paths, `go test ./…`, workspace filters, etc.). These phases derive the command via `refacil-sdd-ai sdd test-scope … --no-baseline-fallback`, which returns an **empty command on fallback** so they **physically cannot** run the whole suite. On fallback they run only touched test files, else SKIP and defer to `/refacil:test`.
6
+ - **`/refacil:verify` never runs the full suite** `/refacil:test` is its mandatory prior step. If there is no test evidence, verify defers to `/refacil:test` instead of running it.
7
+ - **Full suite** — Only `/refacil:test` (component-bounded, once) or **CI / pre-merge**. Non-test phases never run it, even on unreliable scope (they SKIP). Full runs cost more CPU/RAM.
7
8
  - **Tests to add or change** — Keep them **next to** the behavior under change (follow this repo’s layout). Do not run unrelated suites “to be safe”.