npm - refacil-sdd-ai - Versions diffs - 5.3.2 → 5.3.4 - Mend

refacil-sdd-ai 5.3.2 → 5.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/agents/debugger.md +15 -10
package/agents/implementer.md +2 -2
package/agents/validator.md +11 -10
package/lib/commands/sdd.js +8 -1
package/lib/spec-sync.js +7 -1
package/lib/test-scope.js +56 -6
package/package.json +1 -1
package/skills/bug/SKILL.md +2 -3
package/skills/prereqs/METHODOLOGY-CONTRACT.md +9 -7
package/skills/verify/SKILL.md +18 -23
package/templates/testing-policy.md +3 -2

package/agents/debugger.md CHANGED Viewed

@@ -29,7 +29,7 @@ If you prefer to continue here, provide:
   - mode: investigation (only analyze and propose hypotheses) or fix (implement with already-confirmed hypothesis)
   - description: <full bug description>
   - hypothesis: <confirmed root cause> (only for mode=fix)
-  - testScope: scoped \| full (only for mode=fix; default scoped)
+  - testScope: scoped (mode=fix is ALWAYS scoped — rule 0; full regression is /refacil:test's job)
 ```
 **Do not proceed with reads or implementation until the scope is clear.**
@@ -163,7 +163,7 @@ Proposed fix for hypothesis #1:
 ## Fix mode
-The main agent passes you: `mode: fix` + `description` + `hypothesis` (root cause confirmed by the user) + optional **`testScope`** (`scoped` \| `full`, default **`scoped`**).
+The main agent passes you: `mode: fix` + `description` + `hypothesis` (root cause confirmed by the user). Fix mode is **always scoped** (rule 0) — there is no full-suite option in `/refacil:bug`.
 ### Step 1: Implement the fix
@@ -205,14 +205,19 @@ Create `<projectRoot>/refacil-sdd/changes/<fix-name>/summary.md`:
 This file is mandatory for traceability and allows the `check-review` hook to detect the active change. The `.review-passed` will be created by `/refacil:review` upon approval.
-### Step 4: Verify tests (`METHODOLOGY-CONTRACT.md` §3.1)
+### Step 4: Verify tests (`METHODOLOGY-CONTRACT.md` §3.1 — always scoped, rule 0)
-1. Read **`testScope`** from wrapper (default **`scoped`** if omitted).
-2. **`testScope: full`**: Resolve baseline from **`METHODOLOGY-CONTRACT.md §3`**, run **once unparsed** — **all tests** emitted by that command must pass.
-3. **`testScope: scoped`** (default): Collect **`verificationTargets`** — every production/test file **you edited or added** during fix mode (**including** regression tests created this session).
-   - Build **`scopedCommand`** by narrowing baseline §3 to cover only those roots (directories, `-p`/`-pl`, `--`/path suffixes — follow stack docs + **`AGENTS.md` / `.agents/testing.md`** when present — see §3.1 **Scoped command patterns**).
-   - Run **`scopedCommand`**; everything it selects must pass. **Do not** upgrade to repo-wide invocation while `scoped` unless §3.1 says narrowing is unreliable — then run baseline **once**, prepend report line **WARN: scoped narrowing unavailable → full-suite fallback (heavy)**.
-4. **`testsResult.command`** in JSON must quote the **literal** executed shell string (`scopedCommand` or baseline).
+`/refacil:bug` does **not** pass through `/refacil:test`, so you validate the fix with your **own regression tests** (always in scope) — **never the full suite** (rule 0).
+1. Collect **`verificationTargets`** — every production/test file **you edited or added** during fix mode (**including** regression tests created this session).
+2. Derive the scoped command via the CLI (stack-agnostic, structurally bounded):
+   ```
+   refacil-sdd-ai sdd test-scope --files "<verificationTargets-csv>" --baseline "<§3 baseline>" --no-baseline-fallback
+   ```
+   `--no-baseline-fallback` guarantees the CLI **never** returns the full baseline: on fallback `testCommand` is **empty**.
+3. Run the returned `testCommand`; everything it selects must pass.
+4. **Fallback** (empty `testCommand` / `fallback: true`): run **only** the touched files that are themselves test files (your regression tests); if none exist, **SKIP** and prepend report line **WARN: no scopeable tests for touched files → verification deferred**. **Never** run the full repo/package baseline.
+5. **`testsResult.command`** in JSON must quote the **literal** executed shell string (or `null`/empty when SKIPPED).
 ### Report + JSON block (fix)
@@ -254,5 +259,5 @@ This file is mandatory for traceability and allows the `check-review` hook to de
 - In mode=investigation: follow diagnose loop discipline (reproduce, minimize, hypothesize, validate evidence) before proposing a fix.
 - In mode=fix: the fix must be MINIMAL. Never over-refactor.
 - Regression tests are MANDATORY in mode=fix.
-- **Scoped verification**: default **`testScope: scoped`** from wrapper — narrowed command in Step 4, not wholesale “run entire repo suite” unless `full`.
+- **Scoped verification**: **always scoped** (rule 0) — narrowed command in Step 4 via `sdd test-scope --no-baseline-fallback`, never a wholesale “run entire repo suite”. Full regression belongs to `/refacil:test` or CI.
 - Use **concise** output mode by default.

package/agents/implementer.md CHANGED Viewed

@@ -124,9 +124,9 @@ Follow **`METHODOLOGY-CONTRACT.md §3.1`**:
 2. **Derive a minimal scoped smoke command** (stack-agnostic — no hardcoded runners):
    ```
-   refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>"
+   refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>" --no-baseline-fallback
    ```
-   Use the resulting `testCommand` from the output.
+   Use the resulting `testCommand` from the output. The `--no-baseline-fallback` flag is **mandatory in apply**: on fallback the CLI returns an **empty** `testCommand` (never the full baseline), so you physically cannot run the whole suite — apply NEVER runs full regression. If `testCommand` is empty / `fallback: true`, go to step 4 (run touched test files only, else SKIP).
 3. **Run the resulting smoke command.**

package/agents/validator.md CHANGED Viewed

@@ -36,9 +36,10 @@ If you prefer only the report (without applying fixes), respond with the explici
 **BEFORE reading any file or running any command, read this rule.**
-- **If the briefing includes `testExecution`**: follow §3.2 — default **`none`** when absent but `commandsRun` is present. Do **not** run Bash tests unless `testExecution` is `full` or `smoke`.
-- **If `testExecution: full`**: use `testCommand` from the briefing — **do not look up the command in `METHODOLOGY-CONTRACT.md`**. Respect `testScope`, `runCoverage`, and `coverageCommand`.
-- **If `testExecution: smoke`**: run **only** `smokeTestCommand` — no coverage.
+- **If the briefing includes `testExecution`**: follow §3.2 — default **`none`** when absent but `commandsRun` is present. Do **not** run Bash tests unless `testExecution` is `smoke`.
+- **Verify NEVER runs the full suite** (`METHODOLOGY-CONTRACT.md §3.1` rule 0): `/refacil:test` is the mandatory prior step that owns full regression. There is no `full` mode here.
+- **If `testExecution: defer`**: do **not** run any tests — there is no current test evidence (CR-01) or the user asked to re-run the suite. Report tests as **N/A (pending `/refacil:test`)** and tell the user to run `/refacil:test` first.
+- **If `testExecution: smoke`**: run **only** `smokeTestCommand` (already scoped via `--no-baseline-fallback`; if empty, SKIP) — no coverage, never the baseline.
 - **If the briefing includes `criteria`**: use it for verification — **do not re-read the specs** to extract the CA/CR again.
 - **If the briefing includes `changedFiles`**: focus the 3D verification on those files — do not do a global discovery.
 - Read ONLY the specific files needed to verify each CA/CR.
@@ -84,7 +85,7 @@ Produce a list of issues with severity `CRITICAL` / `WARNING` / `SUGGESTION`.
 ### Step 2: Verify tests (conditional — §3.2)
-Read `testExecution` from the briefing (default infer: `none` if `commandsRun` present, else `full`).
+Read `testExecution` from the briefing (default infer: `none` if `commandsRun` present, else `defer` — never `full`; verify does not run the suite, rule 0).
 **`testExecution: none`**:
 - **Do not** run `testCommand`, `smokeTestCommand`, or `coverageCommand`.
@@ -93,14 +94,14 @@ Read `testExecution` from the briefing (default infer: `none` if `commandsRun` p
 - JSON `tests.executed: false`, `tests.delegated: true`, `tests.command` = last `commandsRun` or null.
 **`testExecution: smoke`**:
-- Run **only** `smokeTestCommand`. Do not run `coverageCommand`.
+- Run **only** `smokeTestCommand` (scoped via `--no-baseline-fallback`; if empty/`fallback`, SKIP only the touched test files). Do not run `coverageCommand`. **Never** run the baseline.
 - FAIL if smoke fails; PASS if smoke passes. Note in report that full suite/coverage requires `/refacil:test`.
-**`testExecution: full`**:
-- Run `testCommand` only (already narrowed when `testScope: scoped`). Do not substitute a fuller command.
-- After tests pass, apply coverage per briefing (`runCoverage`, `coverageCommand`, `testScope`) as in §3.1.
+**`testExecution: defer`** (CR-01 — no test evidence, or user asked to re-run the suite):
+- **Do not run any tests.** Verify never runs the full suite (rule 0); `/refacil:test` is the mandatory prior step.
+- Report tests as **N/A (pending `/refacil:test`)**, set `tests.executed: false`, `tests.delegated: true`, and tell the user to run `/refacil:test` before verify.
-**If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3.2 and §3.1; ask user to confirm scope before running tests.
+**If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3.2 and §3.1; default to `none`/`defer` and ask the user to run `/refacil:test` rather than running the suite here.
 ### Step 3: Validate cross-repo ambiguities (optional)
@@ -146,7 +147,7 @@ Required corrections (only if REQUIRES_CORRECTIONS):
   "tests": {
     "executed": <bool>,
     "delegated": <bool>,
-    "executionMode": "none" | "smoke" | "full",
+    "executionMode": "none" | "smoke" | "defer",
     "command": "<command or last commandsRun when delegated>",
     "passed": <bool or null when not executed>,
     "total": <int or null>,

package/lib/commands/sdd.js CHANGED Viewed

@@ -1021,6 +1021,10 @@ function cmdTestScope(argv, projectRoot) {
   const filesRaw = args.files || '';
   const stackHint = args.stack || undefined;
   const baselineCmd = args.baseline || '';
+  // When set, fallback returns an EMPTY testCommand instead of the full baseline.
+  // Used by /refacil:apply so the implementer can never run the whole suite even if
+  // it only looks at the CLI output and ignores the contract's SKIP rule.
+  const noBaselineFallback = args['no-baseline-fallback'] === true;
   // Use the already-resolved projectRoot from handleSdd (via findProjectRoot()) so
   // the CLI works correctly when invoked from a subdirectory within the monorepo.
   const root = projectRoot || process.cwd();
@@ -1031,7 +1035,7 @@ function cmdTestScope(argv, projectRoot) {
     : [];
   const { testScope } = require('../test-scope');
-  const result = testScope({ files, stack: stackHint, baseline: baselineCmd, projectRoot: root });
+  const result = testScope({ files, stack: stackHint, baseline: baselineCmd, projectRoot: root, noBaselineFallback });
   if (wantJson) {
     process.stdout.write(JSON.stringify(result) + '\n');
@@ -1296,6 +1300,9 @@ function sddHelp() {
       --files <csv>                      Comma-separated source file paths to scope tests for
       [--stack <name>]                   Stack hint (node, python, go, rust, java, dotnet)
       [--baseline <cmd>]                 Fallback test command when no test files are found
+      [--no-baseline-fallback]           On fallback, return an EMPTY testCommand instead of the
+                                         baseline (used by /refacil:apply so it can never run the
+                                         whole suite). /refacil:test omits this flag.
       [--json]                           Output result as JSON (testCommand, files, fallback, fallbackReason)
                                          Always exits 0.

package/lib/spec-sync.js CHANGED Viewed

@@ -98,7 +98,13 @@ function parseCriteriaBlocks(markdown) {
   };
   for (const line of lines) {
-    const m = line.match(/^##\s+((?:CA|CR)-\d+):\s*(.+)$/i);
+    // Tolerant heading match so archive's spec-sync never trips on cosmetic
+    // format variance the proposer/agents produce in the wild:
+    //   - heading level: h2 (## CA-01) OR h3+ (### CA-A01) — level is ignored
+    //   - criterion id: numeric (CA-01), feature-prefixed (CA-A01, CA-G01) or
+    //     suffixed (CA-12b) — any [A-Za-z0-9] run after the CA-/CR- prefix
+    // The CA-/CR- token is the real signal; the heading level and id shape are not.
+    const m = line.match(/^#{2,6}\s+((?:CA|CR)-[A-Za-z0-9]+):\s*(.+)$/i);
     if (m) {
       pushCurrent();
       current = { id: m[1].toUpperCase(), title: m[2].trim(), lines: [] };

package/lib/test-scope.js CHANGED Viewed

@@ -27,6 +27,40 @@ const path = require('path');
 const KNOWN_STACKS = ['node', 'python', 'go', 'rust', 'java', 'dotnet'];
+// ---------------------------------------------------------------------------
+// Source-code file extensions per stack.
+//
+// A changed file whose extension is NOT a code extension for its stack cannot
+// have a unit-test mapping (e.g. a skill/agent `.md` doc in a Node repo). Such
+// files must be skipped during scoping: otherwise the loose basename match in
+// findTestFilesByImport produces false positives against test files that merely
+// MENTION the file's name as a string. Test files are themselves code, so this
+// guard never drops a real test.
+// ---------------------------------------------------------------------------
+const CODE_EXTENSIONS = {
+  node: ['.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs'],
+  python: ['.py'],
+  go: ['.go'],
+  rust: ['.rs'],
+  java: ['.java', '.kt', '.kts'],
+  dotnet: ['.cs'],
+};
+/**
+ * Returns true if the file's extension is a recognized source-code extension
+ * for the given stack. Unknown stacks return false.
+ *
+ * @param {string} filePath
+ * @param {string} stack
+ * @returns {boolean}
+ */
+function isCodeFileForStack(filePath, stack) {
+  const exts = CODE_EXTENSIONS[stack];
+  if (!exts) return false;
+  return exts.includes(path.extname(filePath).toLowerCase());
+}
 // ---------------------------------------------------------------------------
 // Planning-only file patterns — never justify a test run on their own.
 // ---------------------------------------------------------------------------
@@ -489,16 +523,24 @@ function buildScopedCommand(absTestFiles, detectedStack, moduleRoot, projectRoot
  * @param {string}   opts.stack        - stack hint (optional; auto-detected if omitted)
  * @param {string}   opts.baseline     - fallback test command (optional)
  * @param {string}   opts.projectRoot  - project root (optional; uses cwd if omitted)
+ * @param {boolean}  opts.noBaselineFallback - when true, fallback returns an EMPTY
+ *   testCommand instead of the full baseline. Used by phases that must NEVER run the
+ *   whole suite (e.g. `/refacil:apply` smoke): the consumer physically never receives
+ *   a full-suite command, so it cannot run it even if it ignores the contract rule.
+ *   `/refacil:test` (which legitimately runs the full suite on fallback) omits this.
  * @returns {{ testCommand: string, files: string[], fallback: boolean, fallbackReason: string|null }}
  */
-function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
+function testScope({ files = [], stack, baseline = '', projectRoot, noBaselineFallback = false } = {}) {
   const root = projectRoot || process.cwd();
   const base = baseline || '';
+  // What to emit as testCommand on every fallback path. In normal mode this is the
+  // baseline; in noBaselineFallback mode it is empty so apply can never run the suite.
+  const fallbackCommand = noBaselineFallback ? '' : base;
   // Fallback: empty files input
   if (!files || files.length === 0) {
     return {
-      testCommand: base,
+      testCommand: fallbackCommand,
       files: [],
       fallback: true,
       fallbackReason: 'No source files provided — falling back to baseline.',
@@ -509,7 +551,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
   const sourceFiles = files.filter((f) => !isPlanningFile(f));
   if (sourceFiles.length === 0) {
     return {
-      testCommand: base,
+      testCommand: fallbackCommand,
       files: [],
       fallback: true,
       fallbackReason: 'All provided files are planning-only (markdown/SDD artifacts) — falling back to baseline.',
@@ -523,7 +565,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
   const stackHintUnknown = stack && !KNOWN_STACKS.includes(stack);
   if (stackHintUnknown) {
     return {
-      testCommand: base,
+      testCommand: fallbackCommand,
       files: [],
       fallback: true,
       fallbackReason: 'Stack could not be determined — falling back to baseline.',
@@ -556,6 +598,13 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
       continue;
     }
+    // Skip non-code files (e.g. markdown skill/agent docs in a Node repo): they
+    // cannot map to a unit test, and a loose basename match would produce false
+    // positives against tests that merely mention the file's name as a string.
+    if (!isCodeFileForStack(absSource, fileStack)) {
+      continue;
+    }
     if (!byModule.has(moduleRoot)) {
       byModule.set(moduleRoot, { stack: fileStack, testFiles: new Set() });
     }
@@ -598,7 +647,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
     if (!anyKnownStack && !stackHintValid) {
       return {
-        testCommand: base,
+        testCommand: fallbackCommand,
         files: [],
         fallback: true,
         fallbackReason: 'Stack could not be determined — falling back to baseline.',
@@ -606,7 +655,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
     }
     return {
-      testCommand: base,
+      testCommand: fallbackCommand,
       files: [],
       fallback: true,
       fallbackReason: 'No test files found for the given source files — falling back to baseline.',
@@ -710,4 +759,5 @@ module.exports = {
   findModuleRoot,
   isTestFile,
   affectedComponents,
+  isCodeFileForStack,
 };

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "refacil-sdd-ai",
-  "version": "5.3.2",
+  "version": "5.3.4",
   "description": "SDD-AI: Specification-Driven Development with AI — development methodology using AI with Claude Code, Cursor, OpenCode and Codex",
   "bin": {
     "refacil-sdd-ai": "./bin/cli.js"

package/skills/bug/SKILL.md CHANGED Viewed

@@ -134,19 +134,18 @@ Apply-specific annotation: use `fix/<ID>` as the branch prefix (e.g. `fix/SEGINF
 ### Step 5: Delegate implementation to the refacil-debugger sub-agent (mode: fix)
-**`testScope` for fix mode** — default **`scoped`**. Parse `$ARGUMENTS` **and** the user message invoking this skill for whole-repo regression (same tokens as apply: **`full`**, **`all tests`**, **`whole suite`**, **`suite completa`**). Pass **`testScope: full`** only when explicitly requested.
+**Test execution for fix mode is ALWAYS scoped** (`METHODOLOGY-CONTRACT.md §3.1` rule 0). `/refacil:bug` does not pass through `/refacil:test`, so the debugger validates the fix with its own regression tests + touched files — **never the full suite**. If the user explicitly asks for whole-repo regression (`full`, `suite completa`, …), do **not** widen this phase: tell them to run `/refacil:test … full` or rely on CI after the fix.
 Invoke the `refacil-debugger` sub-agent passing it:
 - `mode: fix`
 - `description`: complete bug description.
 - `hypothesis`: root cause confirmed by the user in Step 3.
-- `testScope`: `scoped` \| `full` — from the rule above (default **`scoped`**).
 The sub-agent:
 - Implements the minimal and focused fix.
 - Generates regression tests (reproduces the bug + verifies the fix + normal-path assertions where warranted).
 - Creates `refacil-sdd/changes/fix-<name>/summary.md` with traceability. This `fix-*` folder is the approved operational exception to the regular proposal artifacts: it does not need `proposal.md`, `design.md`, `tasks.md`, or `specs.md` to be archived later, but it must include a substantive `summary.md`, regression test evidence from the debugger run, and an approved review before archive.
-- Runs **`testCommand`** per **`METHODOLOGY-CONTRACT.md §3.1`** (narrowed when `scoped`; full baseline only when `full` or narrow fallback warns).
+- Runs a **scoped `testCommand`** derived via `sdd test-scope --no-baseline-fallback` (`METHODOLOGY-CONTRACT.md §3.1` rule 0). On fallback it runs only touched test files, else SKIPs — **never** the full baseline.
 - Returns the report fenced as ` ```refacil-debug-fix `.
 ### Step 6: Present result and next step

package/skills/prereqs/METHODOLOGY-CONTRACT.md CHANGED Viewed

@@ -49,13 +49,15 @@ Coverage (if applicable): detect the project command at the component root (`tes
 **Rules**
+0. **Only `/refacil:test` may run the full/component baseline** (once per cycle, component-bounded). **Every other phase that executes tests** — `/refacil:apply` (implementer smoke), `/refacil:bug` (debugger fix), `/refacil:verify` (smoke after corrections) — derives its command via `refacil-sdd-ai sdd test-scope … --no-baseline-fallback`, which returns an **empty `testCommand` on fallback** (never the baseline). On fallback those phases run **only the touched files that are themselves test files**; if none, they **SKIP** and defer to `/refacil:test`. The "unreliable scope → run baseline once" escape (Scoped command patterns, below) applies **exclusively to `/refacil:test`**. This is a **structural** guarantee, not just a convention: the CLI never hands a full-suite command to a non-test phase, so an agent that only reads the CLI output cannot run the whole suite.
 1. **`testScope: scoped`** (default): sub-agents run tests **only** for artifacts tied to the current change — never invoke the §3 baseline in **full-repo / full-suite** form without narrowing (paths, packages, filters, patterns), except the explicit fallbacks below.
 2. **`testScope: full`**: **on-demand only** — user explicitly requests whole-suite regression in **`/refacil:test`** (or `/refacil:verify`) arguments (e.g. `full`, `all tests`, `whole suite`, `suite completa`). Resolve the §3 baseline command language-agnostically at each **affected component root** and run it from that component dir (`cd <component> && <baseline>`). Never run all monorepo packages — only the component(s) whose files changed. If multiple components are affected, run each in sequence. Coverage = component-wide (not repo-wide).
 3. **`runCoverage: true`** (default): after scoped tests pass, run coverage **narrowed to the change** — instrument/collect only for **`filesToTest`**, **`changedFiles`**, and companion test/spec paths tied to those modules (examples: `--cov=pkg/sub`, Jest `--collectCoverageFrom` globs limited to touched trees, Gradle/JaCoCo scoped modules). If the toolchain cannot narrow, report **N/A** plus a WARNING; do **not** silently widen to repo-wide coverage while `testScope` remains `scoped`.
 4. **`runCoverage: false`**: skip coverage entirely — only when the user **explicitly** opts out (`no coverage`, `nocoverage`, `skip coverage`, `sin cobertura`, etc.) or the project defines **no** coverage command under §3.
 5. **`runCoverage: true` + `testScope: full`**: run the project coverage command **after** the full suite passes, using the repo’s usual global/module coverage behavior (heavy — intended only when the user requested `full`).
-6. **`/refacil:apply` / implementer**: the apply wrapper supplies `testScope` (default `scoped`) and **`testBaselineCommand`**. After editing, the implementer runs `refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>"` and executes the returned smoke command. The implementer **NEVER runs the full repo/package baseline** as the apply verification step — the "unreliable scope → run baseline once" escape hatch in §3.1 Scoped command patterns does NOT apply here. Fallback behaviour: if `test-scope` returns `fallback: true`, fails, or there are no touched files, run only the touched files that are themselves test files; if none exist, **SKIP verification** and add a LOW `issues` entry deferring to `/refacil:test`. Also applies: the wrapper must not precompute a stale narrowed command. Coverage is optional in that step unless the briefing adds an explicit coverage command (unusual; defer to `/refacil:test`).
-7. **`/refacil:bug` / debugger `mode=fix`**: debugger defaults to **`scoped`**, narrows §3 baseline to **`filesModified` ∪ new/updated regression test files** unless the wrapper passed **`testScope: full`**.
+6. **`/refacil:apply` / implementer**: the apply wrapper supplies `testScope` (default `scoped`) and **`testBaselineCommand`**. After editing, the implementer runs `refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>" --no-baseline-fallback` and executes the returned smoke command. The **`--no-baseline-fallback`** flag is the **structural guarantee** that apply never runs the whole suite: on fallback the CLI returns an **empty `testCommand`** (not the baseline), so even an agent that only reads the CLI output cannot run full regression. The implementer **NEVER runs the full repo/package baseline** as the apply verification step — the "unreliable scope → run baseline once" escape hatch in §3.1 Scoped command patterns does NOT apply here. Fallback behaviour: if `test-scope` returns `fallback: true`, fails, or there are no touched files, run only the touched files that are themselves test files; if none exist, **SKIP verification** and add a LOW `issues` entry deferring to `/refacil:test`. Also applies: the wrapper must not precompute a stale narrowed command. Coverage is optional in that step unless the briefing adds an explicit coverage command (unusual; defer to `/refacil:test`).
+7. **`/refacil:bug` / debugger `mode=fix`**: debugger is **always `scoped`** — it derives its command via `refacil-sdd-ai sdd test-scope --files "<filesModified ∪ new/updated regression test files>" --baseline "<§3 baseline>" --no-baseline-fallback` and runs the returned command. Because `/refacil:bug` does **not** pass through `/refacil:test`, it validates the fix with its **own regression tests** (which are touched test files → always in scope), never the full suite. On fallback (empty `testCommand`): run only the touched test files; if none, **SKIP** and add a LOW issue deferring to `/refacil:test`. The debugger **never** runs the full repo/package baseline.
 8. **Re-run / fix-loop (pass-2)**: when iterating on failing tests, run **only the previously-failing test files** — not the entire component suite. This keeps fix loops fast and bounded.
 **Scoped command patterns** (language-agnostic — sub-agent reads `AGENTS.md`, build config, and tool docs; run from the correct module/root):
@@ -63,9 +65,9 @@ Coverage (if applicable): detect the project command at the component root (`tes
 - Pass **explicit test paths**, **packages**, **classes**, or **filters** accepted by that stack (examples: Maven ` -Dtest=…`, Gradle `--tests …`, pytest file paths, `go test ./pkg/…`, `cargo test -p pkg`, .NET solution filter, Ruby `bundle exec rspec path`, JS package scripts with paths after `--`).
 - Prefer files **produced or updated in this session**; until they exist, use the narrowest supported pattern (basename, substring, regex) derived from `filesToTest` / `changedFiles`, per runner docs.
 - **Scoped coverage**: combine the same narrowing with coverage flags/includes that limit **report collection** to touched sources (runner-specific); exclude unrelated packages by default when `testScope: scoped`.
-- **Unreliable scope**: if narrowing cannot be done safely, run the baseline §3 command **once**, report a brief WARNING that the run may be heavy, and suggest CI or **`/refacil:test ... full`** for full regression.
+- **Unreliable scope** (**`/refacil:test` only** — see rule 0): if narrowing cannot be done safely, `/refacil:test` may run the baseline §3 command **once**, report a brief WARNING that the run may be heavy, and suggest CI or **`/refacil:test ... full`** for full regression. **Non-test phases never take this escape** — they use `--no-baseline-fallback` and SKIP instead.
-**Verify (when `testExecution: full`)**: Prefer `commandsRun` from `get-memory` as reference only when re-running; else derive scoped targets from `changedFiles` and/or `git diff --name-only`, using **project test naming and layout** (`AGENTS.md`, test config): e.g. co-located `*Spec.*` / `*Test.*`, `tests/`, language-specific suffices — not a fixed extension.
+**Verify (test evidence)**: verify reads `commandsRun` from `get-memory` as the source of truth for the test run (produced by `/refacil:test`). Verify itself only ever runs a **smoke** (scoped via `--no-baseline-fallback`) after corrections, deriving targets from `correctionTouchedFiles` using **project test naming and layout** (`AGENTS.md`, test config): co-located `*Spec.*` / `*Test.*`, `tests/`, language-specific suffixes. If there is no test evidence, verify **defers to `/refacil:test`** (rule 0) — it does not run the suite itself.
 ### §3.2 — Phase ownership (test execution)
@@ -84,12 +86,12 @@ Coverage (if applicable): detect the project command at the component root (`tes
 | Value | When | Validator behavior |
 |-------|------|-------------------|
 | `none` | Default if `memory.lastStep` is `test` (or later) and `commandsRun` is non-empty; user did not force re-run | **Do not** run `testCommand` or `coverageCommand`. Tests section = **delegated to test phase**; cite last `commandsRun`. |
-| `smoke` | After surgical corrections in verify Step 5 (or rare review fix) | Run **only** companion test files for `correctionTouchedFiles`. **No** `coverageCommand`. |
-| `full` | User tokens (`full`, `re-run tests`, `run tests`, …) **or** CR-01 (no test memory) | Same as §3.1: `testCommand` + optional narrowed/full coverage per `testScope` / `runCoverage`. |
+| `smoke` | After surgical corrections in verify Step 5 (or rare review fix) | Run **only** companion test files for `correctionTouchedFiles`, derived via `sdd test-scope … --no-baseline-fallback` (empty → SKIP). **No** `coverageCommand`. **Never** the full suite. |
+| `defer` | CR-01 (no test memory) **or** user asks to re-run the suite in verify | **Verify never runs the full suite** (rule 0; `/refacil:test` is the mandatory prior step). STOP and tell the user: *"No current test evidence — run `/refacil:test` before verify."* Report tests as **N/A (pending `/refacil:test`)**. |
 **Smoke definition**: the smallest test invocation that exercises files touched by a **correction** (not the whole change). Derive companion paths from project layout (`*Spec*`, `*Test*`, `tests/`, etc.). Smoke **does not** satisfy coverage gates or replace `/refacil:test`.
-**After corrections** (verify Step 5 or review Step 3.5): prefer `testExecution: none` + tell the user to run **`/refacil:test`** before the next full verify; or `smoke` once on correction files. **Never** use `full` in autofix re-verify unless the user explicitly requested it in the same invocation.
+**After corrections** (verify Step 5 or review Step 3.5): prefer `testExecution: none` + tell the user to run **`/refacil:test`** before the next full verify; or `smoke` once on correction files (scoped via `--no-baseline-fallback`). **Never** run the full suite in autofix re-verify — if full regression is wanted, it is `/refacil:test`'s job (rule 0).
 **Review checklist “tests pass”**: PASS when test files exist for the diff, `memory.criteriaRun` covers relevant CA/CR, and static review finds no obvious breakage — **without** running the §3 baseline via Bash unless the user explicitly asked.

package/skills/verify/SKILL.md CHANGED Viewed

@@ -25,12 +25,9 @@ Determine the scope before invoking the sub-agent. Prioritize in this order:
 - **Default**: `testExecution: none` when `get-memory` has `commandsRun` and `lastStep` is `test` (or later) — verify validates CA/CR **without** re-running the test pipeline.
-- **`testExecution: full`** if the user explicitly asked to re-run tests (`full`, `all tests`, `re-run`, `run tests`, `ejecutar tests`, `whole suite`, `suite completa`, `todas`) — then also set `testScope` / `runCoverage` like **`/refacil:test`**:
-  - **`testScope: full`** for whole-suite tokens above.
-  - **`runCoverage: false`** for `no coverage`, `nocoverage`, `skip coverage`, `sin cobertura`, `quick`, `solo tests`.
-  - **`full` + `no coverage`**: `testScope: full`, `runCoverage: false`.
+- **`testExecution: defer`** if the user explicitly asked to re-run the suite (`full`, `all tests`, `re-run`, `run tests`, `ejecutar tests`, `whole suite`, `suite completa`, `todas`): verify does **not** run the suite (rule 0 — that is `/refacil:test`'s job). Tell the user to run `/refacil:test … full` and continue verify with the resulting evidence.
-- **No test memory** (`commandsRun` empty): emit WARNING, set `testExecution: full` (CR-01) unless only `changedFiles` allow a minimal scoped run.
+- **No test memory** (`commandsRun` empty): emit WARNING and set `testExecution: defer` (CR-01) — `/refacil:test` is the mandatory prior step; verify reports tests as N/A pending `/refacil:test` instead of running them itself.
 Do not invoke the sub-agent with ambiguous scope.
@@ -53,17 +50,16 @@ Before invoking the sub-agent, extract the context that the validator would othe
 2. **Cross-skill memory** — when `changeName` is known, run `refacil-sdd-ai sdd get-memory <changeName> --json`. Parse `commandsRun`, `criteriaRun`, and `lastStep`. If the output is `{}` or the command fails, omit those fields — do not block verification (CR-04).
-3. **Resolve `testExecution`** (§3.2) from Step 0 and memory:
-   - User forced re-run → `testExecution: full`.
-   - `commandsRun` non-empty and `lastStep` is `test` (or `verify`/`review` after test) and user did **not** force re-run → `testExecution: none`.
-   - Otherwise → `testExecution: full` with WARNING (no test phase recorded).
+3. **Resolve `testExecution`** (§3.2 — verify **never** runs the full suite, rule 0):
+   - `commandsRun` non-empty and `lastStep` is `test` (or `verify`/`review` after test) and user did **not** request a re-run → `testExecution: none`.
+   - Re-verify after Step 5 corrections → `testExecution: smoke`.
+   - No test evidence (CR-01), or the user asked to re-run the suite → `testExecution: defer` (verify stops and tells the user to run `/refacil:test` first — it does **not** run the suite itself).
-4. **Test commands** — only when `testExecution` is `full` or `smoke`:
-   - **`full`**: follow §3.1 — set `testScope` and `runCoverage` from Step 0; build `testCommand` (scoped from `changedFiles` or baseline if `full`); set `coverageCommand` when `runCoverage: true`.
-   - **`smoke`**: build `smokeTestCommand` for companion tests of `correctionTouchedFiles` only; `runCoverage: false`, `coverageCommand: null`.
-   - **`none`**: omit `testCommand` and `coverageCommand`; set `testsDelegatedFrom: test` and include `commandsRun` for the report.
+4. **Test commands** — only when `testExecution: smoke`:
+   - **`smoke`**: build `smokeTestCommand` for companion tests of `correctionTouchedFiles` **by calling** `refacil-sdd-ai sdd test-scope --files "<correctionTouchedFiles-csv>" --baseline "<§3 baseline>" --no-baseline-fallback` (structurally bounded — empty `testCommand` on fallback). `runCoverage: false`, `coverageCommand: null`.
+   - **`none`** / **`defer`**: omit `testCommand`/`smokeTestCommand` and `coverageCommand`; for `none` set `testsDelegatedFrom: test` and include `commandsRun`; for `defer` instruct the validator to report N/A pending `/refacil:test`.
-5. **Coverage command** — only when `testExecution: full` and `runCoverage: true`; otherwise `coverageCommand: null`.
+5. **Coverage command** — verify never runs coverage (that is `/refacil:test`'s job); always `coverageCommand: null`.
 6. **CA/CR criteria** — if there is an active change, read the specification in `refacil-sdd/changes/<changeName>/`:
    - `specs.md` if it exists, and/or files under `specs/` (recursively).
@@ -75,12 +71,11 @@ Build the BRIEFING block:
 ```
 BRIEFING:
 changeName: <name or null if scope=git-diff>
-testExecution: none | smoke | full
-testCommand: <required when full; omit when none>
-smokeTestCommand: <required when smoke; omit otherwise>
-testScope: scoped | full
-runCoverage: true | false
-coverageCommand: <project coverage entrypoint or null when full+runCoverage>
+testExecution: none | smoke | defer    # never full — verify does not run the suite (rule 0)
+smokeTestCommand: <required when smoke (scoped via --no-baseline-fallback); omit otherwise>
+testScope: scoped
+runCoverage: false
+coverageCommand: null                  # verify never runs coverage
 testsDelegatedFrom: test | null
 correctionTouchedFiles: [...]   # only on re-verify after Step 5 corrections
 criteria:
@@ -102,7 +97,7 @@ Invoke `refacil-validator` passing it the BRIEFING from the previous step.
 The sub-agent:
 - Applies **`testExecution`** from the briefing (§3.2) — **does not** run tests when `none`.
-- When `full`, uses `testCommand` / coverage per §3.1; when `smoke`, runs only `smokeTestCommand` (no coverage).
+- When `smoke`, runs only `smokeTestCommand` (scoped via `--no-baseline-fallback`, no coverage); when `defer`, runs no tests and reports N/A pending `/refacil:test`. Verify never runs the full suite (rule 0).
 - Uses `criteria` from the briefing for verification (without re-reading specs from scratch).
 - Uses `changedFiles` to focus the 3D verification on those files.
 - Applies the **3D framework (Completeness/Correctness/Coherence)** per **`METHODOLOGY-CONTRACT.md §3C — 3C Criterion`** — including the severity table and graceful degradation rule.
@@ -192,7 +187,7 @@ If the command fails, continue silently — it must not block the flow.
    ```
    Corrections applied. Run /refacil:test before the next full verify to refresh the test suite.
    ```
-   **Never** set `testExecution: full` in autofix re-verify unless the user explicitly requested re-run in this invocation.
+   **Never** run the full suite in autofix re-verify — only `smoke` (scoped) or `none`/`defer` (rule 0).
 5. Maximum **2 rounds** of automatic correction. If issues persist, list them for manual correction.
 **If the user does not accept:** list the issues for manual correction. Suggest `/refacil:test` then `/refacil:verify`.
@@ -200,7 +195,7 @@ If the command fails, continue silently — it must not block the flow.
 ## Rules
 - **Always build the briefing (Step 1) before delegating** — reduces the sub-agent tool calls.
-- **Defaults**: `testExecution: none` when test memory exists; **`testExecution: full`** only when Step 0 forces re-run or CR-01 applies. Smoke only after corrections; never full suite in autofix rounds.
+- **Defaults**: `testExecution: none` when test memory exists; `smoke` (scoped) after corrections; `defer` when there is no evidence or the user wants a re-run (verify never runs the full suite — rule 0). `/refacil:test` owns full regression.
 - **Always delegate to the sub-agent** for the analysis. Do not replicate spec reading or test execution logic here.
 - **Dotfiles in `refacil-sdd/changes/`**: never assert absence of `.review-passed` without `-a`; see §8.
 - **Corrections are ONLY applied by this wrapper** (Step 5), after explicit approval.

package/templates/testing-policy.md CHANGED Viewed

@@ -2,6 +2,7 @@
 These rules align with **`METHODOLOGY-CONTRACT.md` §3–§3.1** shipped with SDD-AI (`refacil-prereqs` in your skills install). **Concrete baseline and narrowed commands for this repo** belong in markdown **below** this marked block (not between the `refacil-sdd-ai:testing-policy` markers) so `check-update` can refresh policy text without erasing your commands.
-- **Default: scoped runs** — For `/refacil:apply`, `/refacil:bug` (fix), `/refacil:test`, and `/refacil:verify`, narrow the test runner to **packages/paths/modules touched by the change** (paths after `--`, `-p`/`-pl`, `-Dtest=…`, `pytest` paths, `go test ./…`, workspace filters, etc.). Prefer the **smallest** scope that still covers the diff. Avoid monorepo root commands that fan out to **all** workspaces unless the change truly spans them.
-- **Full suite** — Only when the developer **explicitly** asks (`full`, `whole suite`, `suite completa`, …), in **CI / pre-merge**, or when narrowing is **unsafe** (then run baseline **once** with a clear WARN). Full runs cost more CPU/RAM.
+- **Only `/refacil:test` runs the full suite** — and only the **affected component's** suite, once per cycle. Every other phase that runs tests (`/refacil:apply`, `/refacil:bug` fix, `/refacil:verify` smoke) does **scoped runs** only — it narrows the runner to **packages/paths/modules touched by the change** (paths after `--`, `-p`/`-pl`, `-Dtest=…`, `pytest` paths, `go test ./…`, workspace filters, etc.). These phases derive the command via `refacil-sdd-ai sdd test-scope … --no-baseline-fallback`, which returns an **empty command on fallback** — so they **physically cannot** run the whole suite. On fallback they run only touched test files, else SKIP and defer to `/refacil:test`.
+- **`/refacil:verify` never runs the full suite** — `/refacil:test` is its mandatory prior step. If there is no test evidence, verify defers to `/refacil:test` instead of running it.
+- **Full suite** — Only `/refacil:test` (component-bounded, once) or **CI / pre-merge**. Non-test phases never run it, even on unreliable scope (they SKIP). Full runs cost more CPU/RAM.
 - **Tests to add or change** — Keep them **next to** the behavior under change (follow this repo’s layout). Do not run unrelated suites “to be safe”.