refacil-sdd-ai 5.3.2 → 5.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/debugger.md +15 -10
- package/agents/implementer.md +2 -2
- package/agents/validator.md +11 -10
- package/lib/commands/sdd.js +8 -1
- package/lib/spec-sync.js +7 -1
- package/lib/test-scope.js +56 -6
- package/package.json +1 -1
- package/skills/bug/SKILL.md +2 -3
- package/skills/prereqs/METHODOLOGY-CONTRACT.md +9 -7
- package/skills/verify/SKILL.md +18 -23
- package/templates/testing-policy.md +3 -2
package/agents/debugger.md
CHANGED
|
@@ -29,7 +29,7 @@ If you prefer to continue here, provide:
|
|
|
29
29
|
- mode: investigation (only analyze and propose hypotheses) or fix (implement with already-confirmed hypothesis)
|
|
30
30
|
- description: <full bug description>
|
|
31
31
|
- hypothesis: <confirmed root cause> (only for mode=fix)
|
|
32
|
-
- testScope: scoped
|
|
32
|
+
- testScope: scoped (mode=fix is ALWAYS scoped — rule 0; full regression is /refacil:test's job)
|
|
33
33
|
```
|
|
34
34
|
|
|
35
35
|
**Do not proceed with reads or implementation until the scope is clear.**
|
|
@@ -163,7 +163,7 @@ Proposed fix for hypothesis #1:
|
|
|
163
163
|
|
|
164
164
|
## Fix mode
|
|
165
165
|
|
|
166
|
-
The main agent passes you: `mode: fix` + `description` + `hypothesis` (root cause confirmed by the user)
|
|
166
|
+
The main agent passes you: `mode: fix` + `description` + `hypothesis` (root cause confirmed by the user). Fix mode is **always scoped** (rule 0) — there is no full-suite option in `/refacil:bug`.
|
|
167
167
|
|
|
168
168
|
### Step 1: Implement the fix
|
|
169
169
|
|
|
@@ -205,14 +205,19 @@ Create `<projectRoot>/refacil-sdd/changes/<fix-name>/summary.md`:
|
|
|
205
205
|
|
|
206
206
|
This file is mandatory for traceability and allows the `check-review` hook to detect the active change. The `.review-passed` will be created by `/refacil:review` upon approval.
|
|
207
207
|
|
|
208
|
-
### Step 4: Verify tests (`METHODOLOGY-CONTRACT.md` §3.1)
|
|
208
|
+
### Step 4: Verify tests (`METHODOLOGY-CONTRACT.md` §3.1 — always scoped, rule 0)
|
|
209
209
|
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
210
|
+
`/refacil:bug` does **not** pass through `/refacil:test`, so you validate the fix with your **own regression tests** (always in scope) — **never the full suite** (rule 0).
|
|
211
|
+
|
|
212
|
+
1. Collect **`verificationTargets`** — every production/test file **you edited or added** during fix mode (**including** regression tests created this session).
|
|
213
|
+
2. Derive the scoped command via the CLI (stack-agnostic, structurally bounded):
|
|
214
|
+
```
|
|
215
|
+
refacil-sdd-ai sdd test-scope --files "<verificationTargets-csv>" --baseline "<§3 baseline>" --no-baseline-fallback
|
|
216
|
+
```
|
|
217
|
+
`--no-baseline-fallback` guarantees the CLI **never** returns the full baseline: on fallback `testCommand` is **empty**.
|
|
218
|
+
3. Run the returned `testCommand`; everything it selects must pass.
|
|
219
|
+
4. **Fallback** (empty `testCommand` / `fallback: true`): run **only** the touched files that are themselves test files (your regression tests); if none exist, **SKIP** and prepend report line **WARN: no scopeable tests for touched files → verification deferred**. **Never** run the full repo/package baseline.
|
|
220
|
+
5. **`testsResult.command`** in JSON must quote the **literal** executed shell string (or `null`/empty when SKIPPED).
|
|
216
221
|
|
|
217
222
|
### Report + JSON block (fix)
|
|
218
223
|
|
|
@@ -254,5 +259,5 @@ This file is mandatory for traceability and allows the `check-review` hook to de
|
|
|
254
259
|
- In mode=investigation: follow diagnose loop discipline (reproduce, minimize, hypothesize, validate evidence) before proposing a fix.
|
|
255
260
|
- In mode=fix: the fix must be MINIMAL. Never over-refactor.
|
|
256
261
|
- Regression tests are MANDATORY in mode=fix.
|
|
257
|
-
- **Scoped verification**:
|
|
262
|
+
- **Scoped verification**: **always scoped** (rule 0) — narrowed command in Step 4 via `sdd test-scope --no-baseline-fallback`, never a wholesale “run entire repo suite”. Full regression belongs to `/refacil:test` or CI.
|
|
258
263
|
- Use **concise** output mode by default.
|
package/agents/implementer.md
CHANGED
|
@@ -124,9 +124,9 @@ Follow **`METHODOLOGY-CONTRACT.md §3.1`**:
|
|
|
124
124
|
|
|
125
125
|
2. **Derive a minimal scoped smoke command** (stack-agnostic — no hardcoded runners):
|
|
126
126
|
```
|
|
127
|
-
refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>"
|
|
127
|
+
refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>" --no-baseline-fallback
|
|
128
128
|
```
|
|
129
|
-
Use the resulting `testCommand` from the output.
|
|
129
|
+
Use the resulting `testCommand` from the output. The `--no-baseline-fallback` flag is **mandatory in apply**: on fallback the CLI returns an **empty** `testCommand` (never the full baseline), so you physically cannot run the whole suite — apply NEVER runs full regression. If `testCommand` is empty / `fallback: true`, go to step 4 (run touched test files only, else SKIP).
|
|
130
130
|
|
|
131
131
|
3. **Run the resulting smoke command.**
|
|
132
132
|
|
package/agents/validator.md
CHANGED
|
@@ -36,9 +36,10 @@ If you prefer only the report (without applying fixes), respond with the explici
|
|
|
36
36
|
|
|
37
37
|
**BEFORE reading any file or running any command, read this rule.**
|
|
38
38
|
|
|
39
|
-
- **If the briefing includes `testExecution`**: follow §3.2 — default **`none`** when absent but `commandsRun` is present. Do **not** run Bash tests unless `testExecution` is `
|
|
40
|
-
- **
|
|
41
|
-
- **If `testExecution:
|
|
39
|
+
- **If the briefing includes `testExecution`**: follow §3.2 — default **`none`** when absent but `commandsRun` is present. Do **not** run Bash tests unless `testExecution` is `smoke`.
|
|
40
|
+
- **Verify NEVER runs the full suite** (`METHODOLOGY-CONTRACT.md §3.1` rule 0): `/refacil:test` is the mandatory prior step that owns full regression. There is no `full` mode here.
|
|
41
|
+
- **If `testExecution: defer`**: do **not** run any tests — there is no current test evidence (CR-01) or the user asked to re-run the suite. Report tests as **N/A (pending `/refacil:test`)** and tell the user to run `/refacil:test` first.
|
|
42
|
+
- **If `testExecution: smoke`**: run **only** `smokeTestCommand` (already scoped via `--no-baseline-fallback`; if empty, SKIP) — no coverage, never the baseline.
|
|
42
43
|
- **If the briefing includes `criteria`**: use it for verification — **do not re-read the specs** to extract the CA/CR again.
|
|
43
44
|
- **If the briefing includes `changedFiles`**: focus the 3D verification on those files — do not do a global discovery.
|
|
44
45
|
- Read ONLY the specific files needed to verify each CA/CR.
|
|
@@ -84,7 +85,7 @@ Produce a list of issues with severity `CRITICAL` / `WARNING` / `SUGGESTION`.
|
|
|
84
85
|
|
|
85
86
|
### Step 2: Verify tests (conditional — §3.2)
|
|
86
87
|
|
|
87
|
-
Read `testExecution` from the briefing (default infer: `none` if `commandsRun` present, else `full
|
|
88
|
+
Read `testExecution` from the briefing (default infer: `none` if `commandsRun` present, else `defer` — never `full`; verify does not run the suite, rule 0).
|
|
88
89
|
|
|
89
90
|
**`testExecution: none`**:
|
|
90
91
|
- **Do not** run `testCommand`, `smokeTestCommand`, or `coverageCommand`.
|
|
@@ -93,14 +94,14 @@ Read `testExecution` from the briefing (default infer: `none` if `commandsRun` p
|
|
|
93
94
|
- JSON `tests.executed: false`, `tests.delegated: true`, `tests.command` = last `commandsRun` or null.
|
|
94
95
|
|
|
95
96
|
**`testExecution: smoke`**:
|
|
96
|
-
- Run **only** `smokeTestCommand
|
|
97
|
+
- Run **only** `smokeTestCommand` (scoped via `--no-baseline-fallback`; if empty/`fallback`, SKIP only the touched test files). Do not run `coverageCommand`. **Never** run the baseline.
|
|
97
98
|
- FAIL if smoke fails; PASS if smoke passes. Note in report that full suite/coverage requires `/refacil:test`.
|
|
98
99
|
|
|
99
|
-
**`testExecution:
|
|
100
|
-
-
|
|
101
|
-
-
|
|
100
|
+
**`testExecution: defer`** (CR-01 — no test evidence, or user asked to re-run the suite):
|
|
101
|
+
- **Do not run any tests.** Verify never runs the full suite (rule 0); `/refacil:test` is the mandatory prior step.
|
|
102
|
+
- Report tests as **N/A (pending `/refacil:test`)**, set `tests.executed: false`, `tests.delegated: true`, and tell the user to run `/refacil:test` before verify.
|
|
102
103
|
|
|
103
|
-
**If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3.2 and §3.1; ask user to
|
|
104
|
+
**If there is NO briefing**: resolve by reading `METHODOLOGY-CONTRACT.md` §3.2 and §3.1; default to `none`/`defer` and ask the user to run `/refacil:test` rather than running the suite here.
|
|
104
105
|
|
|
105
106
|
### Step 3: Validate cross-repo ambiguities (optional)
|
|
106
107
|
|
|
@@ -146,7 +147,7 @@ Required corrections (only if REQUIRES_CORRECTIONS):
|
|
|
146
147
|
"tests": {
|
|
147
148
|
"executed": <bool>,
|
|
148
149
|
"delegated": <bool>,
|
|
149
|
-
"executionMode": "none" | "smoke" | "
|
|
150
|
+
"executionMode": "none" | "smoke" | "defer",
|
|
150
151
|
"command": "<command or last commandsRun when delegated>",
|
|
151
152
|
"passed": <bool or null when not executed>,
|
|
152
153
|
"total": <int or null>,
|
package/lib/commands/sdd.js
CHANGED
|
@@ -1021,6 +1021,10 @@ function cmdTestScope(argv, projectRoot) {
|
|
|
1021
1021
|
const filesRaw = args.files || '';
|
|
1022
1022
|
const stackHint = args.stack || undefined;
|
|
1023
1023
|
const baselineCmd = args.baseline || '';
|
|
1024
|
+
// When set, fallback returns an EMPTY testCommand instead of the full baseline.
|
|
1025
|
+
// Used by /refacil:apply so the implementer can never run the whole suite even if
|
|
1026
|
+
// it only looks at the CLI output and ignores the contract's SKIP rule.
|
|
1027
|
+
const noBaselineFallback = args['no-baseline-fallback'] === true;
|
|
1024
1028
|
// Use the already-resolved projectRoot from handleSdd (via findProjectRoot()) so
|
|
1025
1029
|
// the CLI works correctly when invoked from a subdirectory within the monorepo.
|
|
1026
1030
|
const root = projectRoot || process.cwd();
|
|
@@ -1031,7 +1035,7 @@ function cmdTestScope(argv, projectRoot) {
|
|
|
1031
1035
|
: [];
|
|
1032
1036
|
|
|
1033
1037
|
const { testScope } = require('../test-scope');
|
|
1034
|
-
const result = testScope({ files, stack: stackHint, baseline: baselineCmd, projectRoot: root });
|
|
1038
|
+
const result = testScope({ files, stack: stackHint, baseline: baselineCmd, projectRoot: root, noBaselineFallback });
|
|
1035
1039
|
|
|
1036
1040
|
if (wantJson) {
|
|
1037
1041
|
process.stdout.write(JSON.stringify(result) + '\n');
|
|
@@ -1296,6 +1300,9 @@ function sddHelp() {
|
|
|
1296
1300
|
--files <csv> Comma-separated source file paths to scope tests for
|
|
1297
1301
|
[--stack <name>] Stack hint (node, python, go, rust, java, dotnet)
|
|
1298
1302
|
[--baseline <cmd>] Fallback test command when no test files are found
|
|
1303
|
+
[--no-baseline-fallback] On fallback, return an EMPTY testCommand instead of the
|
|
1304
|
+
baseline (used by /refacil:apply so it can never run the
|
|
1305
|
+
whole suite). /refacil:test omits this flag.
|
|
1299
1306
|
[--json] Output result as JSON (testCommand, files, fallback, fallbackReason)
|
|
1300
1307
|
Always exits 0.
|
|
1301
1308
|
|
package/lib/spec-sync.js
CHANGED
|
@@ -98,7 +98,13 @@ function parseCriteriaBlocks(markdown) {
|
|
|
98
98
|
};
|
|
99
99
|
|
|
100
100
|
for (const line of lines) {
|
|
101
|
-
|
|
101
|
+
// Tolerant heading match so archive's spec-sync never trips on cosmetic
|
|
102
|
+
// format variance the proposer/agents produce in the wild:
|
|
103
|
+
// - heading level: h2 (## CA-01) OR h3+ (### CA-A01) — level is ignored
|
|
104
|
+
// - criterion id: numeric (CA-01), feature-prefixed (CA-A01, CA-G01) or
|
|
105
|
+
// suffixed (CA-12b) — any [A-Za-z0-9] run after the CA-/CR- prefix
|
|
106
|
+
// The CA-/CR- token is the real signal; the heading level and id shape are not.
|
|
107
|
+
const m = line.match(/^#{2,6}\s+((?:CA|CR)-[A-Za-z0-9]+):\s*(.+)$/i);
|
|
102
108
|
if (m) {
|
|
103
109
|
pushCurrent();
|
|
104
110
|
current = { id: m[1].toUpperCase(), title: m[2].trim(), lines: [] };
|
package/lib/test-scope.js
CHANGED
|
@@ -27,6 +27,40 @@ const path = require('path');
|
|
|
27
27
|
|
|
28
28
|
const KNOWN_STACKS = ['node', 'python', 'go', 'rust', 'java', 'dotnet'];
|
|
29
29
|
|
|
30
|
+
// ---------------------------------------------------------------------------
|
|
31
|
+
// Source-code file extensions per stack.
|
|
32
|
+
//
|
|
33
|
+
// A changed file whose extension is NOT a code extension for its stack cannot
|
|
34
|
+
// have a unit-test mapping (e.g. a skill/agent `.md` doc in a Node repo). Such
|
|
35
|
+
// files must be skipped during scoping: otherwise the loose basename match in
|
|
36
|
+
// findTestFilesByImport produces false positives against test files that merely
|
|
37
|
+
// MENTION the file's name as a string. Test files are themselves code, so this
|
|
38
|
+
// guard never drops a real test.
|
|
39
|
+
// ---------------------------------------------------------------------------
|
|
40
|
+
|
|
41
|
+
const CODE_EXTENSIONS = {
|
|
42
|
+
node: ['.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs'],
|
|
43
|
+
python: ['.py'],
|
|
44
|
+
go: ['.go'],
|
|
45
|
+
rust: ['.rs'],
|
|
46
|
+
java: ['.java', '.kt', '.kts'],
|
|
47
|
+
dotnet: ['.cs'],
|
|
48
|
+
};
|
|
49
|
+
|
|
50
|
+
/**
|
|
51
|
+
* Returns true if the file's extension is a recognized source-code extension
|
|
52
|
+
* for the given stack. Unknown stacks return false.
|
|
53
|
+
*
|
|
54
|
+
* @param {string} filePath
|
|
55
|
+
* @param {string} stack
|
|
56
|
+
* @returns {boolean}
|
|
57
|
+
*/
|
|
58
|
+
function isCodeFileForStack(filePath, stack) {
|
|
59
|
+
const exts = CODE_EXTENSIONS[stack];
|
|
60
|
+
if (!exts) return false;
|
|
61
|
+
return exts.includes(path.extname(filePath).toLowerCase());
|
|
62
|
+
}
|
|
63
|
+
|
|
30
64
|
// ---------------------------------------------------------------------------
|
|
31
65
|
// Planning-only file patterns — never justify a test run on their own.
|
|
32
66
|
// ---------------------------------------------------------------------------
|
|
@@ -489,16 +523,24 @@ function buildScopedCommand(absTestFiles, detectedStack, moduleRoot, projectRoot
|
|
|
489
523
|
* @param {string} opts.stack - stack hint (optional; auto-detected if omitted)
|
|
490
524
|
* @param {string} opts.baseline - fallback test command (optional)
|
|
491
525
|
* @param {string} opts.projectRoot - project root (optional; uses cwd if omitted)
|
|
526
|
+
* @param {boolean} opts.noBaselineFallback - when true, fallback returns an EMPTY
|
|
527
|
+
* testCommand instead of the full baseline. Used by phases that must NEVER run the
|
|
528
|
+
* whole suite (e.g. `/refacil:apply` smoke): the consumer physically never receives
|
|
529
|
+
* a full-suite command, so it cannot run it even if it ignores the contract rule.
|
|
530
|
+
* `/refacil:test` (which legitimately runs the full suite on fallback) omits this.
|
|
492
531
|
* @returns {{ testCommand: string, files: string[], fallback: boolean, fallbackReason: string|null }}
|
|
493
532
|
*/
|
|
494
|
-
function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
|
|
533
|
+
function testScope({ files = [], stack, baseline = '', projectRoot, noBaselineFallback = false } = {}) {
|
|
495
534
|
const root = projectRoot || process.cwd();
|
|
496
535
|
const base = baseline || '';
|
|
536
|
+
// What to emit as testCommand on every fallback path. In normal mode this is the
|
|
537
|
+
// baseline; in noBaselineFallback mode it is empty so apply can never run the suite.
|
|
538
|
+
const fallbackCommand = noBaselineFallback ? '' : base;
|
|
497
539
|
|
|
498
540
|
// Fallback: empty files input
|
|
499
541
|
if (!files || files.length === 0) {
|
|
500
542
|
return {
|
|
501
|
-
testCommand:
|
|
543
|
+
testCommand: fallbackCommand,
|
|
502
544
|
files: [],
|
|
503
545
|
fallback: true,
|
|
504
546
|
fallbackReason: 'No source files provided — falling back to baseline.',
|
|
@@ -509,7 +551,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
|
|
|
509
551
|
const sourceFiles = files.filter((f) => !isPlanningFile(f));
|
|
510
552
|
if (sourceFiles.length === 0) {
|
|
511
553
|
return {
|
|
512
|
-
testCommand:
|
|
554
|
+
testCommand: fallbackCommand,
|
|
513
555
|
files: [],
|
|
514
556
|
fallback: true,
|
|
515
557
|
fallbackReason: 'All provided files are planning-only (markdown/SDD artifacts) — falling back to baseline.',
|
|
@@ -523,7 +565,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
|
|
|
523
565
|
const stackHintUnknown = stack && !KNOWN_STACKS.includes(stack);
|
|
524
566
|
if (stackHintUnknown) {
|
|
525
567
|
return {
|
|
526
|
-
testCommand:
|
|
568
|
+
testCommand: fallbackCommand,
|
|
527
569
|
files: [],
|
|
528
570
|
fallback: true,
|
|
529
571
|
fallbackReason: 'Stack could not be determined — falling back to baseline.',
|
|
@@ -556,6 +598,13 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
|
|
|
556
598
|
continue;
|
|
557
599
|
}
|
|
558
600
|
|
|
601
|
+
// Skip non-code files (e.g. markdown skill/agent docs in a Node repo): they
|
|
602
|
+
// cannot map to a unit test, and a loose basename match would produce false
|
|
603
|
+
// positives against tests that merely mention the file's name as a string.
|
|
604
|
+
if (!isCodeFileForStack(absSource, fileStack)) {
|
|
605
|
+
continue;
|
|
606
|
+
}
|
|
607
|
+
|
|
559
608
|
if (!byModule.has(moduleRoot)) {
|
|
560
609
|
byModule.set(moduleRoot, { stack: fileStack, testFiles: new Set() });
|
|
561
610
|
}
|
|
@@ -598,7 +647,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
|
|
|
598
647
|
|
|
599
648
|
if (!anyKnownStack && !stackHintValid) {
|
|
600
649
|
return {
|
|
601
|
-
testCommand:
|
|
650
|
+
testCommand: fallbackCommand,
|
|
602
651
|
files: [],
|
|
603
652
|
fallback: true,
|
|
604
653
|
fallbackReason: 'Stack could not be determined — falling back to baseline.',
|
|
@@ -606,7 +655,7 @@ function testScope({ files = [], stack, baseline = '', projectRoot } = {}) {
|
|
|
606
655
|
}
|
|
607
656
|
|
|
608
657
|
return {
|
|
609
|
-
testCommand:
|
|
658
|
+
testCommand: fallbackCommand,
|
|
610
659
|
files: [],
|
|
611
660
|
fallback: true,
|
|
612
661
|
fallbackReason: 'No test files found for the given source files — falling back to baseline.',
|
|
@@ -710,4 +759,5 @@ module.exports = {
|
|
|
710
759
|
findModuleRoot,
|
|
711
760
|
isTestFile,
|
|
712
761
|
affectedComponents,
|
|
762
|
+
isCodeFileForStack,
|
|
713
763
|
};
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "refacil-sdd-ai",
|
|
3
|
-
"version": "5.3.
|
|
3
|
+
"version": "5.3.4",
|
|
4
4
|
"description": "SDD-AI: Specification-Driven Development with AI — development methodology using AI with Claude Code, Cursor, OpenCode and Codex",
|
|
5
5
|
"bin": {
|
|
6
6
|
"refacil-sdd-ai": "./bin/cli.js"
|
package/skills/bug/SKILL.md
CHANGED
|
@@ -134,19 +134,18 @@ Apply-specific annotation: use `fix/<ID>` as the branch prefix (e.g. `fix/SEGINF
|
|
|
134
134
|
|
|
135
135
|
### Step 5: Delegate implementation to the refacil-debugger sub-agent (mode: fix)
|
|
136
136
|
|
|
137
|
-
|
|
137
|
+
**Test execution for fix mode is ALWAYS scoped** (`METHODOLOGY-CONTRACT.md §3.1` rule 0). `/refacil:bug` does not pass through `/refacil:test`, so the debugger validates the fix with its own regression tests + touched files — **never the full suite**. If the user explicitly asks for whole-repo regression (`full`, `suite completa`, …), do **not** widen this phase: tell them to run `/refacil:test … full` or rely on CI after the fix.
|
|
138
138
|
|
|
139
139
|
Invoke the `refacil-debugger` sub-agent passing it:
|
|
140
140
|
- `mode: fix`
|
|
141
141
|
- `description`: complete bug description.
|
|
142
142
|
- `hypothesis`: root cause confirmed by the user in Step 3.
|
|
143
|
-
- `testScope`: `scoped` \| `full` — from the rule above (default **`scoped`**).
|
|
144
143
|
|
|
145
144
|
The sub-agent:
|
|
146
145
|
- Implements the minimal and focused fix.
|
|
147
146
|
- Generates regression tests (reproduces the bug + verifies the fix + normal-path assertions where warranted).
|
|
148
147
|
- Creates `refacil-sdd/changes/fix-<name>/summary.md` with traceability. This `fix-*` folder is the approved operational exception to the regular proposal artifacts: it does not need `proposal.md`, `design.md`, `tasks.md`, or `specs.md` to be archived later, but it must include a substantive `summary.md`, regression test evidence from the debugger run, and an approved review before archive.
|
|
149
|
-
- Runs
|
|
148
|
+
- Runs a **scoped `testCommand`** derived via `sdd test-scope --no-baseline-fallback` (`METHODOLOGY-CONTRACT.md §3.1` rule 0). On fallback it runs only touched test files, else SKIPs — **never** the full baseline.
|
|
150
149
|
- Returns the report fenced as ` ```refacil-debug-fix `.
|
|
151
150
|
|
|
152
151
|
### Step 6: Present result and next step
|
|
@@ -49,13 +49,15 @@ Coverage (if applicable): detect the project command at the component root (`tes
|
|
|
49
49
|
|
|
50
50
|
**Rules**
|
|
51
51
|
|
|
52
|
+
0. **Only `/refacil:test` may run the full/component baseline** (once per cycle, component-bounded). **Every other phase that executes tests** — `/refacil:apply` (implementer smoke), `/refacil:bug` (debugger fix), `/refacil:verify` (smoke after corrections) — derives its command via `refacil-sdd-ai sdd test-scope … --no-baseline-fallback`, which returns an **empty `testCommand` on fallback** (never the baseline). On fallback those phases run **only the touched files that are themselves test files**; if none, they **SKIP** and defer to `/refacil:test`. The "unreliable scope → run baseline once" escape (Scoped command patterns, below) applies **exclusively to `/refacil:test`**. This is a **structural** guarantee, not just a convention: the CLI never hands a full-suite command to a non-test phase, so an agent that only reads the CLI output cannot run the whole suite.
|
|
53
|
+
|
|
52
54
|
1. **`testScope: scoped`** (default): sub-agents run tests **only** for artifacts tied to the current change — never invoke the §3 baseline in **full-repo / full-suite** form without narrowing (paths, packages, filters, patterns), except the explicit fallbacks below.
|
|
53
55
|
2. **`testScope: full`**: **on-demand only** — user explicitly requests whole-suite regression in **`/refacil:test`** (or `/refacil:verify`) arguments (e.g. `full`, `all tests`, `whole suite`, `suite completa`). Resolve the §3 baseline command language-agnostically at each **affected component root** and run it from that component dir (`cd <component> && <baseline>`). Never run all monorepo packages — only the component(s) whose files changed. If multiple components are affected, run each in sequence. Coverage = component-wide (not repo-wide).
|
|
54
56
|
3. **`runCoverage: true`** (default): after scoped tests pass, run coverage **narrowed to the change** — instrument/collect only for **`filesToTest`**, **`changedFiles`**, and companion test/spec paths tied to those modules (examples: `--cov=pkg/sub`, Jest `--collectCoverageFrom` globs limited to touched trees, Gradle/JaCoCo scoped modules). If the toolchain cannot narrow, report **N/A** plus a WARNING; do **not** silently widen to repo-wide coverage while `testScope` remains `scoped`.
|
|
55
57
|
4. **`runCoverage: false`**: skip coverage entirely — only when the user **explicitly** opts out (`no coverage`, `nocoverage`, `skip coverage`, `sin cobertura`, etc.) or the project defines **no** coverage command under §3.
|
|
56
58
|
5. **`runCoverage: true` + `testScope: full`**: run the project coverage command **after** the full suite passes, using the repo’s usual global/module coverage behavior (heavy — intended only when the user requested `full`).
|
|
57
|
-
6. **`/refacil:apply` / implementer**: the apply wrapper supplies `testScope` (default `scoped`) and **`testBaselineCommand`**. After editing, the implementer runs `refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>"` and executes the returned smoke command. The implementer **NEVER runs the full repo/package baseline** as the apply verification step — the "unreliable scope → run baseline once" escape hatch in §3.1 Scoped command patterns does NOT apply here. Fallback behaviour: if `test-scope` returns `fallback: true`, fails, or there are no touched files, run only the touched files that are themselves test files; if none exist, **SKIP verification** and add a LOW `issues` entry deferring to `/refacil:test`. Also applies: the wrapper must not precompute a stale narrowed command. Coverage is optional in that step unless the briefing adds an explicit coverage command (unusual; defer to `/refacil:test`).
|
|
58
|
-
7. **`/refacil:bug` / debugger `mode=fix`**: debugger
|
|
59
|
+
6. **`/refacil:apply` / implementer**: the apply wrapper supplies `testScope` (default `scoped`) and **`testBaselineCommand`**. After editing, the implementer runs `refacil-sdd-ai sdd test-scope --files <touched-files-csv> --baseline "<testBaselineCommand>" --no-baseline-fallback` and executes the returned smoke command. The **`--no-baseline-fallback`** flag is the **structural guarantee** that apply never runs the whole suite: on fallback the CLI returns an **empty `testCommand`** (not the baseline), so even an agent that only reads the CLI output cannot run full regression. The implementer **NEVER runs the full repo/package baseline** as the apply verification step — the "unreliable scope → run baseline once" escape hatch in §3.1 Scoped command patterns does NOT apply here. Fallback behaviour: if `test-scope` returns `fallback: true`, fails, or there are no touched files, run only the touched files that are themselves test files; if none exist, **SKIP verification** and add a LOW `issues` entry deferring to `/refacil:test`. Also applies: the wrapper must not precompute a stale narrowed command. Coverage is optional in that step unless the briefing adds an explicit coverage command (unusual; defer to `/refacil:test`).
|
|
60
|
+
7. **`/refacil:bug` / debugger `mode=fix`**: debugger is **always `scoped`** — it derives its command via `refacil-sdd-ai sdd test-scope --files "<filesModified ∪ new/updated regression test files>" --baseline "<§3 baseline>" --no-baseline-fallback` and runs the returned command. Because `/refacil:bug` does **not** pass through `/refacil:test`, it validates the fix with its **own regression tests** (which are touched test files → always in scope), never the full suite. On fallback (empty `testCommand`): run only the touched test files; if none, **SKIP** and add a LOW issue deferring to `/refacil:test`. The debugger **never** runs the full repo/package baseline.
|
|
59
61
|
8. **Re-run / fix-loop (pass-2)**: when iterating on failing tests, run **only the previously-failing test files** — not the entire component suite. This keeps fix loops fast and bounded.
|
|
60
62
|
|
|
61
63
|
**Scoped command patterns** (language-agnostic — sub-agent reads `AGENTS.md`, build config, and tool docs; run from the correct module/root):
|
|
@@ -63,9 +65,9 @@ Coverage (if applicable): detect the project command at the component root (`tes
|
|
|
63
65
|
- Pass **explicit test paths**, **packages**, **classes**, or **filters** accepted by that stack (examples: Maven ` -Dtest=…`, Gradle `--tests …`, pytest file paths, `go test ./pkg/…`, `cargo test -p pkg`, .NET solution filter, Ruby `bundle exec rspec path`, JS package scripts with paths after `--`).
|
|
64
66
|
- Prefer files **produced or updated in this session**; until they exist, use the narrowest supported pattern (basename, substring, regex) derived from `filesToTest` / `changedFiles`, per runner docs.
|
|
65
67
|
- **Scoped coverage**: combine the same narrowing with coverage flags/includes that limit **report collection** to touched sources (runner-specific); exclude unrelated packages by default when `testScope: scoped`.
|
|
66
|
-
- **Unreliable scope
|
|
68
|
+
- **Unreliable scope** (**`/refacil:test` only** — see rule 0): if narrowing cannot be done safely, `/refacil:test` may run the baseline §3 command **once**, report a brief WARNING that the run may be heavy, and suggest CI or **`/refacil:test ... full`** for full regression. **Non-test phases never take this escape** — they use `--no-baseline-fallback` and SKIP instead.
|
|
67
69
|
|
|
68
|
-
**Verify (
|
|
70
|
+
**Verify (test evidence)**: verify reads `commandsRun` from `get-memory` as the source of truth for the test run (produced by `/refacil:test`). Verify itself only ever runs a **smoke** (scoped via `--no-baseline-fallback`) after corrections, deriving targets from `correctionTouchedFiles` using **project test naming and layout** (`AGENTS.md`, test config): co-located `*Spec.*` / `*Test.*`, `tests/`, language-specific suffixes. If there is no test evidence, verify **defers to `/refacil:test`** (rule 0) — it does not run the suite itself.
|
|
69
71
|
|
|
70
72
|
### §3.2 — Phase ownership (test execution)
|
|
71
73
|
|
|
@@ -84,12 +86,12 @@ Coverage (if applicable): detect the project command at the component root (`tes
|
|
|
84
86
|
| Value | When | Validator behavior |
|
|
85
87
|
|-------|------|-------------------|
|
|
86
88
|
| `none` | Default if `memory.lastStep` is `test` (or later) and `commandsRun` is non-empty; user did not force re-run | **Do not** run `testCommand` or `coverageCommand`. Tests section = **delegated to test phase**; cite last `commandsRun`. |
|
|
87
|
-
| `smoke` | After surgical corrections in verify Step 5 (or rare review fix) | Run **only** companion test files for `correctionTouchedFiles
|
|
88
|
-
| `
|
|
89
|
+
| `smoke` | After surgical corrections in verify Step 5 (or rare review fix) | Run **only** companion test files for `correctionTouchedFiles`, derived via `sdd test-scope … --no-baseline-fallback` (empty → SKIP). **No** `coverageCommand`. **Never** the full suite. |
|
|
90
|
+
| `defer` | CR-01 (no test memory) **or** user asks to re-run the suite in verify | **Verify never runs the full suite** (rule 0; `/refacil:test` is the mandatory prior step). STOP and tell the user: *"No current test evidence — run `/refacil:test` before verify."* Report tests as **N/A (pending `/refacil:test`)**. |
|
|
89
91
|
|
|
90
92
|
**Smoke definition**: the smallest test invocation that exercises files touched by a **correction** (not the whole change). Derive companion paths from project layout (`*Spec*`, `*Test*`, `tests/`, etc.). Smoke **does not** satisfy coverage gates or replace `/refacil:test`.
|
|
91
93
|
|
|
92
|
-
**After corrections** (verify Step 5 or review Step 3.5): prefer `testExecution: none` + tell the user to run **`/refacil:test`** before the next full verify; or `smoke` once on correction files. **Never**
|
|
94
|
+
**After corrections** (verify Step 5 or review Step 3.5): prefer `testExecution: none` + tell the user to run **`/refacil:test`** before the next full verify; or `smoke` once on correction files (scoped via `--no-baseline-fallback`). **Never** run the full suite in autofix re-verify — if full regression is wanted, it is `/refacil:test`'s job (rule 0).
|
|
93
95
|
|
|
94
96
|
**Review checklist “tests pass”**: PASS when test files exist for the diff, `memory.criteriaRun` covers relevant CA/CR, and static review finds no obvious breakage — **without** running the §3 baseline via Bash unless the user explicitly asked.
|
|
95
97
|
|
package/skills/verify/SKILL.md
CHANGED
|
@@ -25,12 +25,9 @@ Determine the scope before invoking the sub-agent. Prioritize in this order:
|
|
|
25
25
|
|
|
26
26
|
- **Default**: `testExecution: none` when `get-memory` has `commandsRun` and `lastStep` is `test` (or later) — verify validates CA/CR **without** re-running the test pipeline.
|
|
27
27
|
|
|
28
|
-
- **`testExecution:
|
|
29
|
-
- **`testScope: full`** for whole-suite tokens above.
|
|
30
|
-
- **`runCoverage: false`** for `no coverage`, `nocoverage`, `skip coverage`, `sin cobertura`, `quick`, `solo tests`.
|
|
31
|
-
- **`full` + `no coverage`**: `testScope: full`, `runCoverage: false`.
|
|
28
|
+
- **`testExecution: defer`** if the user explicitly asked to re-run the suite (`full`, `all tests`, `re-run`, `run tests`, `ejecutar tests`, `whole suite`, `suite completa`, `todas`): verify does **not** run the suite (rule 0 — that is `/refacil:test`'s job). Tell the user to run `/refacil:test … full` and continue verify with the resulting evidence.
|
|
32
29
|
|
|
33
|
-
- **No test memory** (`commandsRun` empty): emit WARNING
|
|
30
|
+
- **No test memory** (`commandsRun` empty): emit WARNING and set `testExecution: defer` (CR-01) — `/refacil:test` is the mandatory prior step; verify reports tests as N/A pending `/refacil:test` instead of running them itself.
|
|
34
31
|
|
|
35
32
|
Do not invoke the sub-agent with ambiguous scope.
|
|
36
33
|
|
|
@@ -53,17 +50,16 @@ Before invoking the sub-agent, extract the context that the validator would othe
|
|
|
53
50
|
|
|
54
51
|
2. **Cross-skill memory** — when `changeName` is known, run `refacil-sdd-ai sdd get-memory <changeName> --json`. Parse `commandsRun`, `criteriaRun`, and `lastStep`. If the output is `{}` or the command fails, omit those fields — do not block verification (CR-04).
|
|
55
52
|
|
|
56
|
-
3. **Resolve `testExecution`** (§3.2
|
|
57
|
-
-
|
|
58
|
-
-
|
|
59
|
-
-
|
|
53
|
+
3. **Resolve `testExecution`** (§3.2 — verify **never** runs the full suite, rule 0):
|
|
54
|
+
- `commandsRun` non-empty and `lastStep` is `test` (or `verify`/`review` after test) and user did **not** request a re-run → `testExecution: none`.
|
|
55
|
+
- Re-verify after Step 5 corrections → `testExecution: smoke`.
|
|
56
|
+
- No test evidence (CR-01), or the user asked to re-run the suite → `testExecution: defer` (verify stops and tells the user to run `/refacil:test` first — it does **not** run the suite itself).
|
|
60
57
|
|
|
61
|
-
4. **Test commands** — only when `testExecution
|
|
62
|
-
- **`
|
|
63
|
-
- **`
|
|
64
|
-
- **`none`**: omit `testCommand` and `coverageCommand`; set `testsDelegatedFrom: test` and include `commandsRun` for the report.
|
|
58
|
+
4. **Test commands** — only when `testExecution: smoke`:
|
|
59
|
+
- **`smoke`**: build `smokeTestCommand` for companion tests of `correctionTouchedFiles` **by calling** `refacil-sdd-ai sdd test-scope --files "<correctionTouchedFiles-csv>" --baseline "<§3 baseline>" --no-baseline-fallback` (structurally bounded — empty `testCommand` on fallback). `runCoverage: false`, `coverageCommand: null`.
|
|
60
|
+
- **`none`** / **`defer`**: omit `testCommand`/`smokeTestCommand` and `coverageCommand`; for `none` set `testsDelegatedFrom: test` and include `commandsRun`; for `defer` instruct the validator to report N/A pending `/refacil:test`.
|
|
65
61
|
|
|
66
|
-
5. **Coverage command** —
|
|
62
|
+
5. **Coverage command** — verify never runs coverage (that is `/refacil:test`'s job); always `coverageCommand: null`.
|
|
67
63
|
|
|
68
64
|
6. **CA/CR criteria** — if there is an active change, read the specification in `refacil-sdd/changes/<changeName>/`:
|
|
69
65
|
- `specs.md` if it exists, and/or files under `specs/` (recursively).
|
|
@@ -75,12 +71,11 @@ Build the BRIEFING block:
|
|
|
75
71
|
```
|
|
76
72
|
BRIEFING:
|
|
77
73
|
changeName: <name or null if scope=git-diff>
|
|
78
|
-
testExecution: none | smoke | full
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
coverageCommand: <project coverage entrypoint or null when full+runCoverage>
|
|
74
|
+
testExecution: none | smoke | defer # never full — verify does not run the suite (rule 0)
|
|
75
|
+
smokeTestCommand: <required when smoke (scoped via --no-baseline-fallback); omit otherwise>
|
|
76
|
+
testScope: scoped
|
|
77
|
+
runCoverage: false
|
|
78
|
+
coverageCommand: null # verify never runs coverage
|
|
84
79
|
testsDelegatedFrom: test | null
|
|
85
80
|
correctionTouchedFiles: [...] # only on re-verify after Step 5 corrections
|
|
86
81
|
criteria:
|
|
@@ -102,7 +97,7 @@ Invoke `refacil-validator` passing it the BRIEFING from the previous step.
|
|
|
102
97
|
|
|
103
98
|
The sub-agent:
|
|
104
99
|
- Applies **`testExecution`** from the briefing (§3.2) — **does not** run tests when `none`.
|
|
105
|
-
- When `
|
|
100
|
+
- When `smoke`, runs only `smokeTestCommand` (scoped via `--no-baseline-fallback`, no coverage); when `defer`, runs no tests and reports N/A pending `/refacil:test`. Verify never runs the full suite (rule 0).
|
|
106
101
|
- Uses `criteria` from the briefing for verification (without re-reading specs from scratch).
|
|
107
102
|
- Uses `changedFiles` to focus the 3D verification on those files.
|
|
108
103
|
- Applies the **3D framework (Completeness/Correctness/Coherence)** per **`METHODOLOGY-CONTRACT.md §3C — 3C Criterion`** — including the severity table and graceful degradation rule.
|
|
@@ -192,7 +187,7 @@ If the command fails, continue silently — it must not block the flow.
|
|
|
192
187
|
```
|
|
193
188
|
Corrections applied. Run /refacil:test before the next full verify to refresh the test suite.
|
|
194
189
|
```
|
|
195
|
-
**Never**
|
|
190
|
+
**Never** run the full suite in autofix re-verify — only `smoke` (scoped) or `none`/`defer` (rule 0).
|
|
196
191
|
5. Maximum **2 rounds** of automatic correction. If issues persist, list them for manual correction.
|
|
197
192
|
|
|
198
193
|
**If the user does not accept:** list the issues for manual correction. Suggest `/refacil:test` then `/refacil:verify`.
|
|
@@ -200,7 +195,7 @@ If the command fails, continue silently — it must not block the flow.
|
|
|
200
195
|
## Rules
|
|
201
196
|
|
|
202
197
|
- **Always build the briefing (Step 1) before delegating** — reduces the sub-agent tool calls.
|
|
203
|
-
- **Defaults**: `testExecution: none` when test memory exists;
|
|
198
|
+
- **Defaults**: `testExecution: none` when test memory exists; `smoke` (scoped) after corrections; `defer` when there is no evidence or the user wants a re-run (verify never runs the full suite — rule 0). `/refacil:test` owns full regression.
|
|
204
199
|
- **Always delegate to the sub-agent** for the analysis. Do not replicate spec reading or test execution logic here.
|
|
205
200
|
- **Dotfiles in `refacil-sdd/changes/`**: never assert absence of `.review-passed` without `-a`; see §8.
|
|
206
201
|
- **Corrections are ONLY applied by this wrapper** (Step 5), after explicit approval.
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
These rules align with **`METHODOLOGY-CONTRACT.md` §3–§3.1** shipped with SDD-AI (`refacil-prereqs` in your skills install). **Concrete baseline and narrowed commands for this repo** belong in markdown **below** this marked block (not between the `refacil-sdd-ai:testing-policy` markers) so `check-update` can refresh policy text without erasing your commands.
|
|
4
4
|
|
|
5
|
-
- **
|
|
6
|
-
-
|
|
5
|
+
- **Only `/refacil:test` runs the full suite** — and only the **affected component's** suite, once per cycle. Every other phase that runs tests (`/refacil:apply`, `/refacil:bug` fix, `/refacil:verify` smoke) does **scoped runs** only — it narrows the runner to **packages/paths/modules touched by the change** (paths after `--`, `-p`/`-pl`, `-Dtest=…`, `pytest` paths, `go test ./…`, workspace filters, etc.). These phases derive the command via `refacil-sdd-ai sdd test-scope … --no-baseline-fallback`, which returns an **empty command on fallback** — so they **physically cannot** run the whole suite. On fallback they run only touched test files, else SKIP and defer to `/refacil:test`.
|
|
6
|
+
- **`/refacil:verify` never runs the full suite** — `/refacil:test` is its mandatory prior step. If there is no test evidence, verify defers to `/refacil:test` instead of running it.
|
|
7
|
+
- **Full suite** — Only `/refacil:test` (component-bounded, once) or **CI / pre-merge**. Non-test phases never run it, even on unreliable scope (they SKIP). Full runs cost more CPU/RAM.
|
|
7
8
|
- **Tests to add or change** — Keep them **next to** the behavior under change (follow this repo’s layout). Do not run unrelated suites “to be safe”.
|