specflow-cc 1.10.0 → 1.11.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +45 -0
- package/README.md +3 -2
- package/agents/sf-spec-executor-orchestrator.md +178 -2
- package/agents/spec-auditor.md +32 -0
- package/agents/spec-creator.md +68 -0
- package/agents/spec-executor-orchestrator.md +212 -2
- package/agents/spec-executor-worker.md +109 -9
- package/agents/spec-executor.md +68 -14
- package/agents/spec-splitter.md +51 -0
- package/commands/sf/run.md +7 -1
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,51 @@ All notable changes to SpecFlow will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [1.11.1] - 2026-02-10
|
|
9
|
+
|
|
10
|
+
### Fixed
|
|
11
|
+
|
|
12
|
+
- **Orchestrator premature state advancement** — executor agents could advance STATE.md beyond "review", skipping the review step entirely
|
|
13
|
+
- After `/sf:run` completion, the orchestrator would sometimes perform `/sf:done` logic (moving spec to Completed, activating next spec)
|
|
14
|
+
- Root cause: vague instructions in STATE.md update step allowed LLM agents to over-interpret "update STATE.md"
|
|
15
|
+
- Added explicit boundary instructions ("DO NOT move to Completed, DO NOT activate next spec") to all three executor agents
|
|
16
|
+
- Added post-execution state verification guard in `/sf:run` command handler
|
|
17
|
+
- Affected files: `sf-spec-executor-orchestrator.md`, `spec-executor-orchestrator.md`, `spec-executor.md`, `run.md`
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## [1.11.0] - 2026-02-06
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
|
|
25
|
+
- **Self-check verification** — all executor agents now verify their own claims before reporting completion
|
|
26
|
+
- `spec-executor`: checks created files exist on disk, commits in git, modified files contain expected changes
|
|
27
|
+
- `spec-executor-worker`: self-check step before returning JSON results; new `self_check` field in response protocol
|
|
28
|
+
- Both orchestrators: aggregated self-check verifying all worker claims against reality
|
|
29
|
+
- Agents refuse to report success if artifacts are missing
|
|
30
|
+
|
|
31
|
+
- **Segmented execution** — large task groups automatically split into sequential segments with fresh context
|
|
32
|
+
- Orchestrators evaluate segmentation threshold (Est. Context >= 20%)
|
|
33
|
+
- Each segment runs in a fresh worker subagent to prevent quality degradation
|
|
34
|
+
- Handoff summaries pass key exports and interface signatures between segments
|
|
35
|
+
- Segment results aggregated into single group result for downstream processing
|
|
36
|
+
- Segment failure handling: abort remaining segments on failure, continue on partial
|
|
37
|
+
|
|
38
|
+
- **Wave column in spec creation** — `spec-creator` and `spec-splitter` now always generate a `Wave` column in Implementation Tasks tables
|
|
39
|
+
- Pre-computed wave numbers during spec creation (not during execution)
|
|
40
|
+
- Orchestrators read wave numbers directly instead of computing dependency graphs
|
|
41
|
+
- Fallback for legacy specs without Wave column preserved
|
|
42
|
+
|
|
43
|
+
### Changed
|
|
44
|
+
|
|
45
|
+
- **Enhanced deviation rules** — all executor agents now include detailed examples, rule priority order, and edge case guidance
|
|
46
|
+
- Rule priority: Rule 4 (architectural) overrides all; Rules 1-3 auto-fix; unsure defaults to Rule 4
|
|
47
|
+
- Standardized tracking format: `[Rule N - Type] {description}`
|
|
48
|
+
|
|
49
|
+
- **Auditor segment hints** — `spec-auditor` can now flag task groups that should be pre-segmented based on estimated context
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
8
53
|
## [1.10.0] - 2026-02-06
|
|
9
54
|
|
|
10
55
|
### Added
|
package/README.md
CHANGED
|
@@ -17,7 +17,7 @@ npx specflow-cc --global
|
|
|
17
17
|
|
|
18
18
|
<br>
|
|
19
19
|
|
|
20
|
-
https://github.com/user-attachments/assets/
|
|
20
|
+
https://github.com/user-attachments/assets/23415009-81f9-4755-9e35-32e8bc56b8e7
|
|
21
21
|
|
|
22
22
|
<br>
|
|
23
23
|
|
|
@@ -413,11 +413,12 @@ Control cost vs quality:
|
|
|
413
413
|
|
|
414
414
|
| Profile | Spec Creation | Execution | Review |
|
|
415
415
|
|---------|---------------|-----------|--------|
|
|
416
|
+
| `max` | Opus | Opus | Opus |
|
|
416
417
|
| `quality` | Opus | Opus | Sonnet |
|
|
417
418
|
| `balanced` | Opus | Sonnet | Sonnet |
|
|
418
419
|
| `budget` | Sonnet | Sonnet | Haiku |
|
|
419
420
|
|
|
420
|
-
Use `quality` for critical features, `budget` for routine tasks.
|
|
421
|
+
Use `max` for maximum quality everywhere, `quality` for critical features, `budget` for routine tasks.
|
|
421
422
|
|
|
422
423
|
---
|
|
423
424
|
|
|
@@ -142,8 +142,49 @@ Waves:
|
|
|
142
142
|
|
|
143
143
|
For each wave:
|
|
144
144
|
|
|
145
|
+
### 3.05 Evaluate Segmentation
|
|
146
|
+
|
|
147
|
+
For each task group in the current wave, check if segmentation is needed.
|
|
148
|
+
|
|
149
|
+
**Segmentation threshold:** Est. Context >= 20%
|
|
150
|
+
|
|
151
|
+
**Decision logic:**
|
|
152
|
+
|
|
153
|
+
| Est. Context | Segment Count | Rationale |
|
|
154
|
+
|--------------|---------------|-----------|
|
|
155
|
+
| < 20% | 1 (no segmentation) | Fits comfortably in fresh context |
|
|
156
|
+
| 20-35% | 2 segments | Split to keep each segment in PEAK range |
|
|
157
|
+
| 35-50% | 3 segments | Three-way split for larger groups |
|
|
158
|
+
| > 50% | 4 segments (default) + warning | Group should have been split by auditor; flag as warning but proceed with 4-way split |
|
|
159
|
+
|
|
160
|
+
**How to determine segment boundaries:**
|
|
161
|
+
|
|
162
|
+
Parse the task group's task list and divide at natural boundaries:
|
|
163
|
+
1. File boundaries (each segment handles a subset of files)
|
|
164
|
+
2. Logical unit boundaries (types first, then implementations, then wiring)
|
|
165
|
+
3. If tasks are numbered (T1, T2, T3...), divide the task numbers evenly
|
|
166
|
+
|
|
167
|
+
**Segment plan format:**
|
|
168
|
+
|
|
169
|
+
For each segmented group, create a segment plan:
|
|
170
|
+
|
|
171
|
+
| Segment | Tasks | Files | Est. Context |
|
|
172
|
+
|---------|-------|-------|--------------|
|
|
173
|
+
| G2-S1 | Create types, Create handler-a | types.ts, handler-a.ts | ~12% |
|
|
174
|
+
| G2-S2 | Create handler-b, Create tests | handler-b.ts, tests.ts | ~13% |
|
|
175
|
+
|
|
176
|
+
**Pre-computed segments from auditor:**
|
|
177
|
+
|
|
178
|
+
If the Implementation Tasks table includes a `Segments` column, use those segment boundaries instead of computing them at runtime.
|
|
179
|
+
|
|
145
180
|
### 3.1 Spawn Workers
|
|
146
181
|
|
|
182
|
+
For each task group in the current wave:
|
|
183
|
+
|
|
184
|
+
**If group is NOT segmented (standard path):**
|
|
185
|
+
|
|
186
|
+
Spawn worker as today (unchanged).
|
|
187
|
+
|
|
147
188
|
**Parallel (preferred):**
|
|
148
189
|
```
|
|
149
190
|
Task(prompt="<task_group>G2: Create handler-a</task_group>
|
|
@@ -161,6 +202,104 @@ Task(prompt="...G4...", subagent_type="sf-spec-executor-worker", model="{profile
|
|
|
161
202
|
|
|
162
203
|
**Sequential fallback:** If parallel fails, execute one at a time.
|
|
163
204
|
|
|
205
|
+
**If group IS segmented:**
|
|
206
|
+
|
|
207
|
+
Execute segments sequentially within the group, each in a fresh worker:
|
|
208
|
+
|
|
209
|
+
Segment 1:
|
|
210
|
+
```
|
|
211
|
+
Task(prompt="<task_group>G2-S1: Create types and handler-a</task_group>
|
|
212
|
+
<segment_info>
|
|
213
|
+
Segment 1 of 2 for group G2.
|
|
214
|
+
This is the FIRST segment. No prior work exists.
|
|
215
|
+
</segment_info>
|
|
216
|
+
<requirements>{G2-S1 requirements}</requirements>
|
|
217
|
+
<project_patterns>@.specflow/PROJECT.md</project_patterns>
|
|
218
|
+
<context_budget>
|
|
219
|
+
Estimated: ~12%
|
|
220
|
+
Target max: 25%
|
|
221
|
+
</context_budget>
|
|
222
|
+
Implement this segment. Create atomic commits.
|
|
223
|
+
Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
|
|
224
|
+
", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 1/2")
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
Wait for Segment 1 result, then:
|
|
228
|
+
|
|
229
|
+
Segment 2:
|
|
230
|
+
```
|
|
231
|
+
Task(prompt="<task_group>G2-S2: Create handler-b and tests</task_group>
|
|
232
|
+
<segment_info>
|
|
233
|
+
Segment 2 of 2 for group G2.
|
|
234
|
+
Prior segment completed. Summary of prior work:
|
|
235
|
+
</segment_info>
|
|
236
|
+
<prior_segment_summary>
|
|
237
|
+
## Completed Segments
|
|
238
|
+
|
|
239
|
+
### Segment 1 of N
|
|
240
|
+
**Status:** complete
|
|
241
|
+
**Files created:**
|
|
242
|
+
- `path/to/file1.ts` -- brief description (key exports: X, Y)
|
|
243
|
+
- `path/to/file2.ts` -- brief description (key exports: Z)
|
|
244
|
+
|
|
245
|
+
**Files modified:**
|
|
246
|
+
- `path/to/existing.ts` -- what changed
|
|
247
|
+
|
|
248
|
+
**Commits:** hash1, hash2
|
|
249
|
+
|
|
250
|
+
**Key interfaces/types defined:**
|
|
251
|
+
- InterfaceName: { field1: type, field2: type }
|
|
252
|
+
- TypeName: description
|
|
253
|
+
</prior_segment_summary>
|
|
254
|
+
<requirements>{G2-S2 requirements}</requirements>
|
|
255
|
+
<project_patterns>@.specflow/PROJECT.md</project_patterns>
|
|
256
|
+
<context_budget>
|
|
257
|
+
Estimated: ~13%
|
|
258
|
+
Target max: 25%
|
|
259
|
+
</context_budget>
|
|
260
|
+
Implement this segment. Create atomic commits.
|
|
261
|
+
You can reference files created by prior segments but do NOT re-read them unless you need specific details.
|
|
262
|
+
Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
|
|
263
|
+
", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 2/2")
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
**Important:** Segments within a group are ALWAYS sequential (never parallel) because later segments depend on earlier ones.
|
|
267
|
+
|
|
268
|
+
**Parallel behavior:** Non-segmented groups in the same wave still run in parallel alongside segmented groups. The segmented group's sequential segments run independently of other groups.
|
|
269
|
+
|
|
270
|
+
### 3.15 Aggregate Segment Results
|
|
271
|
+
|
|
272
|
+
After all segments for a group complete, merge results into a single group result:
|
|
273
|
+
|
|
274
|
+
```json
|
|
275
|
+
{
|
|
276
|
+
"group": "G2",
|
|
277
|
+
"status": "{worst status among segments: failed > partial > complete}",
|
|
278
|
+
"files_created": ["{union of all segments' files_created}"],
|
|
279
|
+
"files_modified": ["{union of all segments' files_modified}"],
|
|
280
|
+
"commits": ["{concatenation of all segments' commits in order}"],
|
|
281
|
+
"criteria_met": ["{union of all segments' criteria_met}"],
|
|
282
|
+
"deviations": ["{concatenation of all segments' deviations}"],
|
|
283
|
+
"error": "{first non-null error, or null}",
|
|
284
|
+
"segmented": true,
|
|
285
|
+
"segment_count": 2,
|
|
286
|
+
"segment_results": [
|
|
287
|
+
{"segment": 1, "status": "complete", ...},
|
|
288
|
+
{"segment": 2, "status": "complete", ...}
|
|
289
|
+
]
|
|
290
|
+
}
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
This aggregated result feeds into the existing Step 3.2 (Collect Results) and Step 3.3 (Handle Failures) unchanged.
|
|
294
|
+
|
|
295
|
+
**Segment failure handling:**
|
|
296
|
+
|
|
297
|
+
| Scenario | Action |
|
|
298
|
+
|----------|--------|
|
|
299
|
+
| Segment N fails | Abort remaining segments for this group, mark group as failed |
|
|
300
|
+
| Segment N partial | Continue to next segment with available results, mark group as partial |
|
|
301
|
+
| All segments complete | Aggregate into single group result, mark group as complete |
|
|
302
|
+
|
|
164
303
|
### 3.2 Collect Results
|
|
165
304
|
|
|
166
305
|
Parse each worker's JSON response.
|
|
@@ -184,7 +323,35 @@ Criteria met: [union of all criteria_met]
|
|
|
184
323
|
Deviations: [collect all deviations]
|
|
185
324
|
```
|
|
186
325
|
|
|
187
|
-
## Step 6:
|
|
326
|
+
## Step 6: Aggregated Self-Check
|
|
327
|
+
|
|
328
|
+
After aggregating results, verify that all worker claims are real.
|
|
329
|
+
|
|
330
|
+
**1. Check all created files exist:**
|
|
331
|
+
|
|
332
|
+
For each file in the aggregated `files_created` list:
|
|
333
|
+
```bash
|
|
334
|
+
[ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
**2. Check all commits exist:**
|
|
338
|
+
|
|
339
|
+
For each commit hash in the aggregated `commits` list:
|
|
340
|
+
```bash
|
|
341
|
+
git log --oneline -20 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
**3. Check worker self_check fields:**
|
|
345
|
+
|
|
346
|
+
If any worker returned `self_check: "partial"` or `self_check: "failed"`, flag those groups for investigation.
|
|
347
|
+
|
|
348
|
+
**4. Handle failures:**
|
|
349
|
+
|
|
350
|
+
- If missing files/commits found: report discrepancies in Execution Summary
|
|
351
|
+
- Do NOT report success with missing artifacts
|
|
352
|
+
- If critical files missing: mark affected groups as `partial`
|
|
353
|
+
|
|
354
|
+
## Step 7: Create Final Summary
|
|
188
355
|
|
|
189
356
|
Append Execution Summary to specification:
|
|
190
357
|
|
|
@@ -216,11 +383,19 @@ Append Execution Summary to specification:
|
|
|
216
383
|
{aggregated deviations}
|
|
217
384
|
```
|
|
218
385
|
|
|
219
|
-
## Step
|
|
386
|
+
## Step 8: Update STATE.md
|
|
220
387
|
|
|
388
|
+
Update ONLY the Current Position section:
|
|
221
389
|
- Status → "review"
|
|
222
390
|
- Next Step → "/sf:review"
|
|
223
391
|
|
|
392
|
+
**CRITICAL — DO NOT go beyond this:**
|
|
393
|
+
- Do NOT move the spec to Completed Specifications table
|
|
394
|
+
- Do NOT remove the spec from Queue table
|
|
395
|
+
- Do NOT activate the next specification in the queue
|
|
396
|
+
- Do NOT archive the spec file
|
|
397
|
+
- These actions belong to `/sf:done`, not to execution
|
|
398
|
+
|
|
224
399
|
</process>
|
|
225
400
|
|
|
226
401
|
<output>
|
|
@@ -276,6 +451,7 @@ Output directly as formatted text (not wrapped in a code block):
|
|
|
276
451
|
- [ ] Each worker receives no more than 3 task groups
|
|
277
452
|
- [ ] All worker results collected and parsed
|
|
278
453
|
- [ ] Failures handled per failure handling rules
|
|
454
|
+
- [ ] Aggregated self-check passed (all files and commits verified)
|
|
279
455
|
- [ ] Results aggregated into final summary
|
|
280
456
|
- [ ] Execution Summary appended to specification
|
|
281
457
|
- [ ] STATE.md updated to "review"
|
package/agents/spec-auditor.md
CHANGED
|
@@ -467,6 +467,38 @@ If the spec has a Goal Analysis section, enhance Implementation Tasks:
|
|
|
467
467
|
|
|
468
468
|
Note: "Enables Truths" column is only added when Goal Analysis is present.
|
|
469
469
|
|
|
470
|
+
### 4.45 Segment Hints for Large Groups
|
|
471
|
+
|
|
472
|
+
After generating task groups, check if any single group has Est. Context >= 20%.
|
|
473
|
+
|
|
474
|
+
If so, add a `Segments` column to the Implementation Tasks table:
|
|
475
|
+
|
|
476
|
+
| Group | Wave | Tasks | Dependencies | Est. Context | Segments |
|
|
477
|
+
|-------|------|-------|--------------|--------------|----------|
|
|
478
|
+
| G1 | 1 | Create types | - | ~8% | 1 |
|
|
479
|
+
| G2 | 2 | Create handlers, tests, validation | G1 | ~28% | 2 |
|
|
480
|
+
| G3 | 2 | Create UI components | G1 | ~10% | 1 |
|
|
481
|
+
|
|
482
|
+
For groups with Segments > 1, add segment breakdown in the Execution Plan:
|
|
483
|
+
|
|
484
|
+
**G2 Segments:**
|
|
485
|
+
- S1: Create handlers (handler-a.ts, handler-b.ts) -- ~14%
|
|
486
|
+
- S2: Create tests and validation (handler.test.ts, validation.ts) -- ~14%
|
|
487
|
+
|
|
488
|
+
**Segment count guidance:**
|
|
489
|
+
|
|
490
|
+
| Est. Context | Segment Count |
|
|
491
|
+
|--------------|---------------|
|
|
492
|
+
| < 20% | 1 (no segmentation) |
|
|
493
|
+
| 20-35% | 2 segments |
|
|
494
|
+
| 35-50% | 3 segments |
|
|
495
|
+
| > 50% | 4 segments (with warning: consider splitting into separate task groups) |
|
|
496
|
+
|
|
497
|
+
**Segment boundaries should follow natural divisions:**
|
|
498
|
+
1. File boundaries (each segment handles subset of files)
|
|
499
|
+
2. Logical unit boundaries (types first, then implementations, then wiring)
|
|
500
|
+
3. Sequential task ordering (T1-T3 in S1, T4-T6 in S2)
|
|
501
|
+
|
|
470
502
|
## Step 4.5: Compute Execution Waves
|
|
471
503
|
|
|
472
504
|
After generating task groups (or for any spec with Implementation Tasks):
|
package/agents/spec-creator.md
CHANGED
|
@@ -130,6 +130,74 @@ Write to `.specflow/specs/SPEC-XXX.md` using the template structure:
|
|
|
130
130
|
8. **Assumptions:** What you assumed (clearly marked)
|
|
131
131
|
- **If `<prior_discussion>` provided:** Decisions from discussion are facts, not assumptions
|
|
132
132
|
|
|
133
|
+
## Step 5.5: Generate Implementation Tasks (for medium and large specs)
|
|
134
|
+
|
|
135
|
+
**When to include:**
|
|
136
|
+
- **Medium** and **large** complexity specs: Always include Implementation Tasks section
|
|
137
|
+
- **Small** complexity specs: Optional (skip if only 1-2 files or simple change)
|
|
138
|
+
|
|
139
|
+
**Task Groups:**
|
|
140
|
+
|
|
141
|
+
1. Group related work logically:
|
|
142
|
+
- Types/interfaces first (foundational)
|
|
143
|
+
- Independent implementations (can run parallel)
|
|
144
|
+
- Integration/wiring last (depends on implementations)
|
|
145
|
+
|
|
146
|
+
2. For each group, define:
|
|
147
|
+
- **Group ID**: G1, G2, G3, etc.
|
|
148
|
+
- **Tasks**: Brief description of what the group does
|
|
149
|
+
- **Dependencies**: Which groups must complete first (use `—` for none)
|
|
150
|
+
- **Est. Context**: Rough estimate (e.g., ~15%, ~20%)
|
|
151
|
+
|
|
152
|
+
**Wave Assignment Algorithm:**
|
|
153
|
+
|
|
154
|
+
Assign wave numbers to enable parallel execution:
|
|
155
|
+
|
|
156
|
+
1. Initialize all groups with wave = 0 (unassigned)
|
|
157
|
+
2. For each group with no dependencies: wave = 1
|
|
158
|
+
3. Repeat until all groups have waves:
|
|
159
|
+
- For each unassigned group:
|
|
160
|
+
- If all dependencies have assigned waves:
|
|
161
|
+
- wave = max(dependency waves) + 1
|
|
162
|
+
4. If groups remain unassigned after a full pass with no progress:
|
|
163
|
+
- Circular dependency exists
|
|
164
|
+
- Flag in spec as note: "Note: Circular dependency detected in groups [list]. Auditor will verify."
|
|
165
|
+
|
|
166
|
+
**Implementation Tasks Table:**
|
|
167
|
+
|
|
168
|
+
Generate the table with Wave column:
|
|
169
|
+
|
|
170
|
+
```markdown
|
|
171
|
+
### Task Groups
|
|
172
|
+
|
|
173
|
+
| Group | Wave | Tasks | Dependencies | Est. Context |
|
|
174
|
+
|-------|------|-------|--------------|--------------|
|
|
175
|
+
| G1 | 1 | Create types | — | ~10% |
|
|
176
|
+
| G2 | 2 | Create handler | G1 | ~20% |
|
|
177
|
+
| G3 | 2 | Create tests | G1 | ~15% |
|
|
178
|
+
| G4 | 3 | Wire integration | G2, G3 | ~10% |
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
**Execution Plan:**
|
|
182
|
+
|
|
183
|
+
Generate the Execution Plan summary showing parallel opportunities:
|
|
184
|
+
|
|
185
|
+
```markdown
|
|
186
|
+
### Execution Plan
|
|
187
|
+
|
|
188
|
+
| Wave | Groups | Parallel? | Workers |
|
|
189
|
+
|------|--------|-----------|---------|
|
|
190
|
+
| 1 | G1 | No | 1 |
|
|
191
|
+
| 2 | G2, G3 | Yes | 2 |
|
|
192
|
+
| 3 | G4 | No | 1 |
|
|
193
|
+
|
|
194
|
+
**Total workers needed:** 2 (max in any wave)
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
- **Parallel?**: "Yes" if wave has >1 group, "No" otherwise
|
|
198
|
+
- **Workers**: Count of groups in the wave
|
|
199
|
+
- **Total workers needed**: Maximum Workers value across all waves
|
|
200
|
+
|
|
133
201
|
## Step 6: Estimate Complexity
|
|
134
202
|
|
|
135
203
|
Based on:
|
|
@@ -85,6 +85,7 @@ Worker returns structured JSON:
|
|
|
85
85
|
"commits": ["abc123", "def456"],
|
|
86
86
|
"criteria_met": ["Criterion 1", "Criterion 2"],
|
|
87
87
|
"deviations": [],
|
|
88
|
+
"self_check": "passed|partial|skipped",
|
|
88
89
|
"error": null
|
|
89
90
|
}
|
|
90
91
|
```
|
|
@@ -138,6 +139,37 @@ This helps workers make trade-off decisions:
|
|
|
138
139
|
"error": null
|
|
139
140
|
}
|
|
140
141
|
}
|
|
142
|
+
},
|
|
143
|
+
{
|
|
144
|
+
"id": 2,
|
|
145
|
+
"status": "in_progress",
|
|
146
|
+
"results": {
|
|
147
|
+
"G2": {
|
|
148
|
+
"status": "in_progress",
|
|
149
|
+
"segmented": true,
|
|
150
|
+
"segment_count": 2,
|
|
151
|
+
"segments": [
|
|
152
|
+
{
|
|
153
|
+
"segment": 1,
|
|
154
|
+
"status": "complete",
|
|
155
|
+
"commits": ["abc123"],
|
|
156
|
+
"files_created": ["types.ts"],
|
|
157
|
+
"handoff_summary": "..."
|
|
158
|
+
},
|
|
159
|
+
{
|
|
160
|
+
"segment": 2,
|
|
161
|
+
"status": "running",
|
|
162
|
+
"commits": [],
|
|
163
|
+
"files_created": []
|
|
164
|
+
}
|
|
165
|
+
]
|
|
166
|
+
},
|
|
167
|
+
"G3": {
|
|
168
|
+
"status": "complete",
|
|
169
|
+
"segmented": false,
|
|
170
|
+
"commits": ["def456"]
|
|
171
|
+
}
|
|
172
|
+
}
|
|
141
173
|
}
|
|
142
174
|
],
|
|
143
175
|
"commits": ["all", "commit", "hashes"],
|
|
@@ -320,8 +352,49 @@ Verify prerequisites:
|
|
|
320
352
|
2. "Continue anyway" -> proceed despite issues
|
|
321
353
|
3. "Abort" -> stop execution, preserve state
|
|
322
354
|
|
|
355
|
+
### 3.05 Evaluate Segmentation
|
|
356
|
+
|
|
357
|
+
For each task group in the current wave, check if segmentation is needed.
|
|
358
|
+
|
|
359
|
+
**Segmentation threshold:** Est. Context >= 20%
|
|
360
|
+
|
|
361
|
+
**Decision logic:**
|
|
362
|
+
|
|
363
|
+
| Est. Context | Segment Count | Rationale |
|
|
364
|
+
|--------------|---------------|-----------|
|
|
365
|
+
| < 20% | 1 (no segmentation) | Fits comfortably in fresh context |
|
|
366
|
+
| 20-35% | 2 segments | Split to keep each segment in PEAK range |
|
|
367
|
+
| 35-50% | 3 segments | Three-way split for larger groups |
|
|
368
|
+
| > 50% | 4 segments (default) + warning | Group should have been split by auditor; flag as warning but proceed with 4-way split |
|
|
369
|
+
|
|
370
|
+
**How to determine segment boundaries:**
|
|
371
|
+
|
|
372
|
+
Parse the task group's task list and divide at natural boundaries:
|
|
373
|
+
1. File boundaries (each segment handles a subset of files)
|
|
374
|
+
2. Logical unit boundaries (types first, then implementations, then wiring)
|
|
375
|
+
3. If tasks are numbered (T1, T2, T3...), divide the task numbers evenly
|
|
376
|
+
|
|
377
|
+
**Segment plan format:**
|
|
378
|
+
|
|
379
|
+
For each segmented group, create a segment plan:
|
|
380
|
+
|
|
381
|
+
| Segment | Tasks | Files | Est. Context |
|
|
382
|
+
|---------|-------|-------|--------------|
|
|
383
|
+
| G2-S1 | Create types, Create handler-a | types.ts, handler-a.ts | ~12% |
|
|
384
|
+
| G2-S2 | Create handler-b, Create tests | handler-b.ts, tests.ts | ~13% |
|
|
385
|
+
|
|
386
|
+
**Pre-computed segments from auditor:**
|
|
387
|
+
|
|
388
|
+
If the Implementation Tasks table includes a `Segments` column, use those segment boundaries instead of computing them at runtime.
|
|
389
|
+
|
|
323
390
|
### 3.1 Spawn Workers
|
|
324
391
|
|
|
392
|
+
For each task group in the current wave:
|
|
393
|
+
|
|
394
|
+
**If group is NOT segmented (standard path):**
|
|
395
|
+
|
|
396
|
+
Spawn worker as today (unchanged).
|
|
397
|
+
|
|
325
398
|
**Parallel (preferred):**
|
|
326
399
|
```
|
|
327
400
|
Task(prompt="<task_group>G2: Create handler-a</task_group>
|
|
@@ -344,6 +417,104 @@ Task(prompt="...G4 (with context_budget)...", subagent_type="sf-spec-executor-wo
|
|
|
344
417
|
|
|
345
418
|
**Sequential fallback:** If parallel fails, execute one at a time.
|
|
346
419
|
|
|
420
|
+
**If group IS segmented:**
|
|
421
|
+
|
|
422
|
+
Execute segments sequentially within the group, each in a fresh worker:
|
|
423
|
+
|
|
424
|
+
Segment 1:
|
|
425
|
+
```
|
|
426
|
+
Task(prompt="<task_group>G2-S1: Create types and handler-a</task_group>
|
|
427
|
+
<segment_info>
|
|
428
|
+
Segment 1 of 2 for group G2.
|
|
429
|
+
This is the FIRST segment. No prior work exists.
|
|
430
|
+
</segment_info>
|
|
431
|
+
<requirements>{G2-S1 requirements}</requirements>
|
|
432
|
+
<project_patterns>@.specflow/PROJECT.md</project_patterns>
|
|
433
|
+
<context_budget>
|
|
434
|
+
Estimated: ~12%
|
|
435
|
+
Target max: 25%
|
|
436
|
+
</context_budget>
|
|
437
|
+
Implement this segment. Create atomic commits.
|
|
438
|
+
Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
|
|
439
|
+
", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 1/2")
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
Wait for Segment 1 result, then:
|
|
443
|
+
|
|
444
|
+
Segment 2:
|
|
445
|
+
```
|
|
446
|
+
Task(prompt="<task_group>G2-S2: Create handler-b and tests</task_group>
|
|
447
|
+
<segment_info>
|
|
448
|
+
Segment 2 of 2 for group G2.
|
|
449
|
+
Prior segment completed. Summary of prior work:
|
|
450
|
+
</segment_info>
|
|
451
|
+
<prior_segment_summary>
|
|
452
|
+
## Completed Segments
|
|
453
|
+
|
|
454
|
+
### Segment 1 of N
|
|
455
|
+
**Status:** complete
|
|
456
|
+
**Files created:**
|
|
457
|
+
- `path/to/file1.ts` -- brief description (key exports: X, Y)
|
|
458
|
+
- `path/to/file2.ts` -- brief description (key exports: Z)
|
|
459
|
+
|
|
460
|
+
**Files modified:**
|
|
461
|
+
- `path/to/existing.ts` -- what changed
|
|
462
|
+
|
|
463
|
+
**Commits:** hash1, hash2
|
|
464
|
+
|
|
465
|
+
**Key interfaces/types defined:**
|
|
466
|
+
- InterfaceName: { field1: type, field2: type }
|
|
467
|
+
- TypeName: description
|
|
468
|
+
</prior_segment_summary>
|
|
469
|
+
<requirements>{G2-S2 requirements}</requirements>
|
|
470
|
+
<project_patterns>@.specflow/PROJECT.md</project_patterns>
|
|
471
|
+
<context_budget>
|
|
472
|
+
Estimated: ~13%
|
|
473
|
+
Target max: 25%
|
|
474
|
+
</context_budget>
|
|
475
|
+
Implement this segment. Create atomic commits.
|
|
476
|
+
You can reference files created by prior segments but do NOT re-read them unless you need specific details.
|
|
477
|
+
Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
|
|
478
|
+
", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 2/2")
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
**Important:** Segments within a group are ALWAYS sequential (never parallel) because later segments depend on earlier ones.
|
|
482
|
+
|
|
483
|
+
**Parallel behavior:** Non-segmented groups in the same wave still run in parallel alongside segmented groups. The segmented group's sequential segments run independently of other groups.
|
|
484
|
+
|
|
485
|
+
### 3.15 Aggregate Segment Results
|
|
486
|
+
|
|
487
|
+
After all segments for a group complete, merge results into a single group result:
|
|
488
|
+
|
|
489
|
+
```json
|
|
490
|
+
{
|
|
491
|
+
"group": "G2",
|
|
492
|
+
"status": "{worst status among segments: failed > partial > complete}",
|
|
493
|
+
"files_created": ["{union of all segments' files_created}"],
|
|
494
|
+
"files_modified": ["{union of all segments' files_modified}"],
|
|
495
|
+
"commits": ["{concatenation of all segments' commits in order}"],
|
|
496
|
+
"criteria_met": ["{union of all segments' criteria_met}"],
|
|
497
|
+
"deviations": ["{concatenation of all segments' deviations}"],
|
|
498
|
+
"error": "{first non-null error, or null}",
|
|
499
|
+
"segmented": true,
|
|
500
|
+
"segment_count": 2,
|
|
501
|
+
"segment_results": [
|
|
502
|
+
{"segment": 1, "status": "complete", ...},
|
|
503
|
+
{"segment": 2, "status": "complete", ...}
|
|
504
|
+
]
|
|
505
|
+
}
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
This aggregated result feeds into the existing Step 3.2 (Collect Results) and Step 3.3 (Update State Per Worker) unchanged.
|
|
509
|
+
|
|
510
|
+
**Segment failure handling:**
|
|
511
|
+
|
|
512
|
+
| Scenario | Action |
|
|
513
|
+
|----------|--------|
|
|
514
|
+
| Segment N fails | Abort remaining segments for this group, mark group as failed |
|
|
515
|
+
| Segment N partial | Continue to next segment with available results, mark group as partial |
|
|
516
|
+
| All segments complete | Aggregate into single group result, mark group as complete |
|
|
517
|
+
|
|
347
518
|
### 3.2 Collect Results
|
|
348
519
|
|
|
349
520
|
Parse each worker's JSON response.
|
|
@@ -438,6 +609,36 @@ Criteria met: [union of all criteria_met]
|
|
|
438
609
|
Deviations: [collect all deviations]
|
|
439
610
|
```
|
|
440
611
|
|
|
612
|
+
## Step 4.5: Final Aggregated Self-Check
|
|
613
|
+
|
|
614
|
+
After aggregating all results, verify claims against reality.
|
|
615
|
+
|
|
616
|
+
**1. Check all created files exist:**
|
|
617
|
+
|
|
618
|
+
For each file in the aggregated `files_created` list:
|
|
619
|
+
```bash
|
|
620
|
+
[ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
|
|
621
|
+
```
|
|
622
|
+
|
|
623
|
+
**2. Check all commits exist:**
|
|
624
|
+
|
|
625
|
+
For each commit hash in the aggregated `commits` list:
|
|
626
|
+
```bash
|
|
627
|
+
git log --oneline -30 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
|
|
628
|
+
```
|
|
629
|
+
|
|
630
|
+
**3. Check worker self_check fields:**
|
|
631
|
+
|
|
632
|
+
If any worker returned `self_check: "partial"` or `self_check: "skipped"`, flag those groups.
|
|
633
|
+
|
|
634
|
+
**4. Handle discrepancies:**
|
|
635
|
+
|
|
636
|
+
- Missing files/commits → report in Execution Summary under "Self-Check Issues"
|
|
637
|
+
- Do NOT report full success with missing artifacts
|
|
638
|
+
- If critical files missing → mark affected groups as `partial` in state
|
|
639
|
+
|
|
640
|
+
**Do NOT skip this step.**
|
|
641
|
+
|
|
441
642
|
## Step 5: Create Final Summary
|
|
442
643
|
|
|
443
644
|
Append Execution Summary to specification:
|
|
@@ -486,10 +687,18 @@ rm .specflow/execution/SPEC-XXX-state.json
|
|
|
486
687
|
|
|
487
688
|
## Step 7: Update STATE.md
|
|
488
689
|
|
|
489
|
-
|
|
490
|
-
-
|
|
690
|
+
Update ONLY the Current Position section:
|
|
691
|
+
- Status → "review"
|
|
692
|
+
- Next Step → "/sf:review"
|
|
491
693
|
- Remove or update Execution Status row
|
|
492
694
|
|
|
695
|
+
**CRITICAL — DO NOT go beyond this:**
|
|
696
|
+
- Do NOT move the spec to Completed Specifications table
|
|
697
|
+
- Do NOT remove the spec from Queue table
|
|
698
|
+
- Do NOT activate the next specification in the queue
|
|
699
|
+
- Do NOT archive the spec file
|
|
700
|
+
- These actions belong to `/sf:done`, not to execution
|
|
701
|
+
|
|
493
702
|
</process>
|
|
494
703
|
|
|
495
704
|
<output>
|
|
@@ -551,6 +760,7 @@ Output directly as formatted text (not wrapped in a code block):
|
|
|
551
760
|
- [ ] State updated after each wave completes
|
|
552
761
|
- [ ] Post-wave verification performed after each wave
|
|
553
762
|
- [ ] Failures handled per failure handling rules
|
|
763
|
+
- [ ] Final aggregated self-check passed (all files and commits verified)
|
|
554
764
|
- [ ] Results aggregated into final summary
|
|
555
765
|
- [ ] State file deleted on successful completion
|
|
556
766
|
- [ ] Execution Summary appended to specification
|
|
@@ -27,20 +27,28 @@ You receive ONLY your task group's requirements from the orchestrator.
|
|
|
27
27
|
|
|
28
28
|
## Deviation Rules (inherited from spec-executor)
|
|
29
29
|
|
|
30
|
+
Apply these rules automatically. Track all deviations for the result JSON.
|
|
31
|
+
|
|
30
32
|
**Rule 1: Auto-fix bugs** (no permission needed)
|
|
31
33
|
- Code doesn't work as intended → fix inline, continue
|
|
34
|
+
- Examples: wrong logic, type errors, null pointers, broken validation, security vulnerabilities
|
|
35
|
+
- Track as: `[Rule 1 - Bug] {description}`
|
|
32
36
|
|
|
33
37
|
**Rule 2: Auto-add missing critical functionality** (no permission needed)
|
|
34
|
-
- Missing essentials for correctness/security
|
|
35
|
-
-
|
|
38
|
+
- Missing essentials for correctness/security → add inline, continue
|
|
39
|
+
- Examples: missing error handling, no input validation, missing null checks, no auth on protected routes
|
|
40
|
+
- Track as: `[Rule 2 - Missing Critical] {description}`
|
|
36
41
|
|
|
37
42
|
**Rule 3: Auto-fix blocking issues** (no permission needed)
|
|
38
|
-
- Prevents task completion
|
|
39
|
-
-
|
|
43
|
+
- Prevents task completion → fix and continue
|
|
44
|
+
- Examples: missing dependency, broken import paths, wrong types, build config error
|
|
45
|
+
- Track as: `[Rule 3 - Blocking] {description}`
|
|
40
46
|
|
|
41
47
|
**Rule 4: Ask about architectural changes** (requires user decision)
|
|
42
|
-
- Significant structural modifications needed
|
|
43
|
-
-
|
|
48
|
+
- Significant structural modifications needed → STOP and ask user
|
|
49
|
+
- Examples: new database table, schema changes, framework switching, changing API contracts
|
|
50
|
+
|
|
51
|
+
**Rule Priority:** Rule 4 overrides all → Rules 1-3 auto-fix → unsure = Rule 4
|
|
44
52
|
|
|
45
53
|
## Atomic Commits
|
|
46
54
|
|
|
@@ -71,11 +79,19 @@ When writing or modifying code:
|
|
|
71
79
|
|
|
72
80
|
From orchestrator prompt, extract:
|
|
73
81
|
- Task group ID (e.g., "G2")
|
|
82
|
+
- **Segment info (if present):** segment number, total segments
|
|
83
|
+
- **Prior segment summary (if present):** files created, key exports, commits
|
|
74
84
|
- Task description
|
|
75
|
-
- Requirements for this group
|
|
85
|
+
- Requirements for this group/segment
|
|
76
86
|
- Interfaces/types to use (from previous groups)
|
|
77
87
|
- Project patterns reference
|
|
78
88
|
|
|
89
|
+
**If segment info is present:**
|
|
90
|
+
- This is a segmented execution
|
|
91
|
+
- Focus ONLY on tasks assigned to this segment
|
|
92
|
+
- Use prior segment summary to understand what already exists
|
|
93
|
+
- Do NOT re-read files from prior segments unless you need specific implementation details
|
|
94
|
+
|
|
79
95
|
## Step 2: Load Required Context
|
|
80
96
|
|
|
81
97
|
Read ONLY what's needed:
|
|
@@ -129,10 +145,33 @@ Deviations:
|
|
|
129
145
|
- [Rule 2 - Missing] Added {functionality} for {reason}
|
|
130
146
|
```
|
|
131
147
|
|
|
132
|
-
## Step 5:
|
|
148
|
+
## Step 5: Self-Check (Verify Your Own Claims)
|
|
149
|
+
|
|
150
|
+
Before returning results, verify that your work actually exists.
|
|
151
|
+
|
|
152
|
+
**1. Check created files exist:**
|
|
153
|
+
```bash
|
|
154
|
+
[ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
**2. Check commits exist:**
|
|
158
|
+
```bash
|
|
159
|
+
git log --oneline -10 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
**3. If ANY check fails:**
|
|
163
|
+
- Fix the issue before returning results
|
|
164
|
+
- Do NOT return `status: "complete"` with missing artifacts
|
|
165
|
+
- If unfixable, return `status: "partial"` with error explaining what's missing
|
|
166
|
+
|
|
167
|
+
**Do NOT skip this step.**
|
|
168
|
+
|
|
169
|
+
## Step 6: Return Results
|
|
133
170
|
|
|
134
171
|
Output structured JSON for orchestrator:
|
|
135
172
|
|
|
173
|
+
**For non-segmented execution:**
|
|
174
|
+
|
|
136
175
|
```json
|
|
137
176
|
{
|
|
138
177
|
"group": "G2",
|
|
@@ -149,10 +188,45 @@ Output structured JSON for orchestrator:
|
|
|
149
188
|
"handleQuerySub processes QUERY_SUB messages"
|
|
150
189
|
],
|
|
151
190
|
"deviations": [],
|
|
191
|
+
"self_check": "passed",
|
|
152
192
|
"error": null
|
|
153
193
|
}
|
|
154
194
|
```
|
|
155
195
|
|
|
196
|
+
**For segmented execution, add segment fields:**
|
|
197
|
+
|
|
198
|
+
```json
|
|
199
|
+
{
|
|
200
|
+
"group": "G2",
|
|
201
|
+
"segment": 1,
|
|
202
|
+
"segment_total": 2,
|
|
203
|
+
"status": "complete",
|
|
204
|
+
"files_created": ["path/to/types.ts", "path/to/handler-a.ts"],
|
|
205
|
+
"files_modified": [],
|
|
206
|
+
"commits": ["abc123", "def456"],
|
|
207
|
+
"criteria_met": ["Types defined", "HandlerA implemented"],
|
|
208
|
+
"deviations": [],
|
|
209
|
+
"self_check": "passed",
|
|
210
|
+
"error": null,
|
|
211
|
+
"handoff_summary": {
|
|
212
|
+
"key_exports": ["UserType", "ConfigType", "HandlerA"],
|
|
213
|
+
"interfaces": "UserType: { id: string, name: string }",
|
|
214
|
+
"notes": "HandlerA expects ConfigType in constructor"
|
|
215
|
+
}
|
|
216
|
+
}
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
**The `handoff_summary` field** (for segmented execution only) contains:
|
|
220
|
+
- Key exports from created files
|
|
221
|
+
- Interface/type signatures that later segments will need
|
|
222
|
+
- Brief notes about design decisions or conventions established
|
|
223
|
+
|
|
224
|
+
**Rules for handoff summary:**
|
|
225
|
+
- Include file paths and key exports (not full file contents)
|
|
226
|
+
- Include interface/type signatures if they are needed by later segments
|
|
227
|
+
- Maximum ~500 words per segment summary
|
|
228
|
+
- Do NOT include implementation details, only the public API surface
|
|
229
|
+
|
|
156
230
|
**Status values:**
|
|
157
231
|
- `complete`: All tasks done successfully
|
|
158
232
|
- `partial`: Some tasks done, others blocked
|
|
@@ -164,7 +238,7 @@ Output structured JSON for orchestrator:
|
|
|
164
238
|
|
|
165
239
|
Return ONLY the structured JSON result. The orchestrator will parse this.
|
|
166
240
|
|
|
167
|
-
**On success:**
|
|
241
|
+
**On success (non-segmented):**
|
|
168
242
|
```json
|
|
169
243
|
{
|
|
170
244
|
"group": "G2",
|
|
@@ -174,10 +248,33 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
|
|
|
174
248
|
"commits": ["abc1234", "def5678"],
|
|
175
249
|
"criteria_met": ["Criterion 1", "Criterion 2"],
|
|
176
250
|
"deviations": [],
|
|
251
|
+
"self_check": "passed",
|
|
177
252
|
"error": null
|
|
178
253
|
}
|
|
179
254
|
```
|
|
180
255
|
|
|
256
|
+
**On success (segmented):**
|
|
257
|
+
```json
|
|
258
|
+
{
|
|
259
|
+
"group": "G2",
|
|
260
|
+
"segment": 1,
|
|
261
|
+
"segment_total": 2,
|
|
262
|
+
"status": "complete",
|
|
263
|
+
"files_created": ["path/to/types.ts"],
|
|
264
|
+
"files_modified": [],
|
|
265
|
+
"commits": ["abc1234"],
|
|
266
|
+
"criteria_met": ["Types defined"],
|
|
267
|
+
"deviations": [],
|
|
268
|
+
"self_check": "passed",
|
|
269
|
+
"error": null,
|
|
270
|
+
"handoff_summary": {
|
|
271
|
+
"key_exports": ["UserType", "ConfigType"],
|
|
272
|
+
"interfaces": "UserType: { id: string, name: string }",
|
|
273
|
+
"notes": "Types exported from types.ts module"
|
|
274
|
+
}
|
|
275
|
+
}
|
|
276
|
+
```
|
|
277
|
+
|
|
181
278
|
**On partial completion:**
|
|
182
279
|
```json
|
|
183
280
|
{
|
|
@@ -188,6 +285,7 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
|
|
|
188
285
|
"commits": ["abc1234"],
|
|
189
286
|
"criteria_met": ["Criterion 1"],
|
|
190
287
|
"deviations": [],
|
|
288
|
+
"self_check": "partial",
|
|
191
289
|
"error": "Could not complete task X: missing dependency Y"
|
|
192
290
|
}
|
|
193
291
|
```
|
|
@@ -202,6 +300,7 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
|
|
|
202
300
|
"commits": [],
|
|
203
301
|
"criteria_met": [],
|
|
204
302
|
"deviations": [],
|
|
303
|
+
"self_check": "skipped",
|
|
205
304
|
"error": "Failed to implement: {reason}"
|
|
206
305
|
}
|
|
207
306
|
```
|
|
@@ -214,6 +313,7 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
|
|
|
214
313
|
- [ ] All tasks in group implemented
|
|
215
314
|
- [ ] Atomic commits created for each logical unit
|
|
216
315
|
- [ ] Deviations documented (if any)
|
|
316
|
+
- [ ] Self-check passed (all files and commits verified)
|
|
217
317
|
- [ ] Structured JSON result returned
|
|
218
318
|
- [ ] Status reflects actual completion state
|
|
219
319
|
</success_criteria>
|
package/agents/spec-executor.md
CHANGED
|
@@ -28,26 +28,39 @@ The specification is your contract. Follow it exactly:
|
|
|
28
28
|
|
|
29
29
|
## Deviation Rules
|
|
30
30
|
|
|
31
|
-
When reality doesn't match the plan
|
|
31
|
+
When reality doesn't match the plan, apply these rules automatically. Track all deviations for the Execution Summary.
|
|
32
32
|
|
|
33
33
|
**Rule 1: Auto-fix bugs** (no permission needed)
|
|
34
|
-
- Code doesn't work as intended
|
|
35
|
-
-
|
|
34
|
+
- Code doesn't work as intended → fix inline, continue
|
|
35
|
+
- Examples: wrong logic, type errors, null pointer exceptions, broken validation, security vulnerabilities (SQL injection, XSS), race conditions, memory leaks
|
|
36
|
+
- Track as: `[Rule 1 - Bug] {description}`
|
|
36
37
|
|
|
37
38
|
**Rule 2: Auto-add missing critical functionality** (no permission needed)
|
|
38
|
-
- Missing essentials for correctness/security
|
|
39
|
-
-
|
|
40
|
-
-
|
|
39
|
+
- Missing essentials for correctness/security → add inline, continue
|
|
40
|
+
- Examples: missing error handling (no try/catch), no input validation, missing null checks, no auth on protected routes, missing required indexes
|
|
41
|
+
- Critical = required for correct/secure operation. These are NOT "features" — they're correctness requirements
|
|
42
|
+
- Track as: `[Rule 2 - Missing Critical] {description}`
|
|
41
43
|
|
|
42
44
|
**Rule 3: Auto-fix blocking issues** (no permission needed)
|
|
43
|
-
- Something prevents task completion
|
|
44
|
-
-
|
|
45
|
-
-
|
|
45
|
+
- Something prevents task completion → fix and continue
|
|
46
|
+
- Examples: missing dependency, broken import paths, wrong types blocking compilation, missing env variable, build config error, circular dependency
|
|
47
|
+
- Track as: `[Rule 3 - Blocking] {description}`
|
|
46
48
|
|
|
47
49
|
**Rule 4: Ask about architectural changes** (requires user decision)
|
|
48
|
-
- Significant structural modification required
|
|
49
|
-
-
|
|
50
|
-
-
|
|
50
|
+
- Significant structural modification required → STOP and ask user
|
|
51
|
+
- Examples: adding new database table, major schema changes, switching libraries/frameworks, changing API contracts, adding new infrastructure layer
|
|
52
|
+
- Present: what you found, proposed change, why needed, impact, alternatives
|
|
53
|
+
|
|
54
|
+
**Rule Priority** (when multiple could apply):
|
|
55
|
+
1. If Rule 4 applies → STOP (architectural decision needed)
|
|
56
|
+
2. If Rules 1-3 apply → fix automatically, track for summary
|
|
57
|
+
3. If genuinely unsure → apply Rule 4 (safer to ask)
|
|
58
|
+
|
|
59
|
+
**Edge case guidance:**
|
|
60
|
+
- "This validation is missing" → Rule 2 (critical for security)
|
|
61
|
+
- "This crashes on null" → Rule 1 (bug)
|
|
62
|
+
- "Need to add a database table" → Rule 4 (architectural)
|
|
63
|
+
- "Need to add a column" → Rule 1 or 2 (depends on context)
|
|
51
64
|
|
|
52
65
|
## Atomic Commits
|
|
53
66
|
|
|
@@ -153,7 +166,39 @@ If any deviations occurred (Rules 1-3), document them:
|
|
|
153
166
|
2. [Rule 2 - Missing] Added {functionality} for {reason}
|
|
154
167
|
```
|
|
155
168
|
|
|
156
|
-
## Step 7:
|
|
169
|
+
## Step 7: Self-Check (Verify Your Own Claims)
|
|
170
|
+
|
|
171
|
+
After implementation, verify that your work actually exists before reporting completion.
|
|
172
|
+
|
|
173
|
+
**1. Check created files exist:**
|
|
174
|
+
|
|
175
|
+
For each file you claim to have created:
|
|
176
|
+
```bash
|
|
177
|
+
[ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
**2. Check commits exist:**
|
|
181
|
+
|
|
182
|
+
For each commit hash you recorded:
|
|
183
|
+
```bash
|
|
184
|
+
git log --oneline -10 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**3. Check modified files have expected changes:**
|
|
188
|
+
|
|
189
|
+
For key modifications, verify the change is present:
|
|
190
|
+
```bash
|
|
191
|
+
grep -q "expected_pattern" path/to/modified/file && echo "VERIFIED" || echo "NOT FOUND"
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
**4. Report self-check result:**
|
|
195
|
+
|
|
196
|
+
- If ALL checks pass: continue to Execution Summary
|
|
197
|
+
- If ANY check fails: **fix the issue** before proceeding, do NOT report success with missing artifacts
|
|
198
|
+
|
|
199
|
+
**Do NOT skip this step. Do NOT report completion if self-check fails.**
|
|
200
|
+
|
|
201
|
+
## Step 8: Create Execution Summary
|
|
157
202
|
|
|
158
203
|
Append to specification:
|
|
159
204
|
|
|
@@ -186,11 +231,19 @@ Append to specification:
|
|
|
186
231
|
{Any important implementation notes for reviewer}
|
|
187
232
|
```
|
|
188
233
|
|
|
189
|
-
## Step
|
|
234
|
+
## Step 9: Update STATE.md
|
|
190
235
|
|
|
236
|
+
Update ONLY the Current Position section:
|
|
191
237
|
- Status → "review"
|
|
192
238
|
- Next Step → "/sf:review"
|
|
193
239
|
|
|
240
|
+
**CRITICAL — DO NOT go beyond this:**
|
|
241
|
+
- Do NOT move the spec to Completed Specifications table
|
|
242
|
+
- Do NOT remove the spec from Queue table
|
|
243
|
+
- Do NOT activate the next specification in the queue
|
|
244
|
+
- Do NOT archive the spec file
|
|
245
|
+
- These actions belong to `/sf:done`, not to execution
|
|
246
|
+
|
|
194
247
|
</process>
|
|
195
248
|
|
|
196
249
|
<output>
|
|
@@ -237,6 +290,7 @@ Output directly as formatted text (not wrapped in a code block):
|
|
|
237
290
|
- [ ] All acceptance criteria addressed
|
|
238
291
|
- [ ] Atomic commits created
|
|
239
292
|
- [ ] Deviations documented
|
|
293
|
+
- [ ] Self-check passed (all files and commits verified)
|
|
240
294
|
- [ ] Execution Summary added to spec
|
|
241
295
|
- [ ] STATE.md updated
|
|
242
296
|
</success_criteria>
|
package/agents/spec-splitter.md
CHANGED
|
@@ -112,6 +112,57 @@ After user approval, create each child spec:
|
|
|
112
112
|
7. Copy applicable Constraints
|
|
113
113
|
8. Note inherited Assumptions
|
|
114
114
|
|
|
115
|
+
**Implementation Tasks for Child Specs:**
|
|
116
|
+
|
|
117
|
+
When a child spec contains **3+ task groups**, include an Implementation Tasks section with Wave column:
|
|
118
|
+
|
|
119
|
+
### Wave Assignment Algorithm
|
|
120
|
+
|
|
121
|
+
1. Initialize all groups with wave = 0 (unassigned)
|
|
122
|
+
2. For each group with no dependencies: wave = 1
|
|
123
|
+
3. Repeat until all groups have waves:
|
|
124
|
+
- For each unassigned group:
|
|
125
|
+
- If all dependencies have assigned waves:
|
|
126
|
+
- wave = max(dependency waves) + 1
|
|
127
|
+
4. If groups remain unassigned after a full pass with no progress:
|
|
128
|
+
- Circular dependency exists
|
|
129
|
+
- Flag in spec as note for auditor
|
|
130
|
+
|
|
131
|
+
### Implementation Tasks Table Format
|
|
132
|
+
|
|
133
|
+
```markdown
|
|
134
|
+
### Task Groups
|
|
135
|
+
|
|
136
|
+
| Group | Wave | Tasks | Dependencies | Est. Context |
|
|
137
|
+
|-------|------|-------|--------------|--------------|
|
|
138
|
+
| G1 | 1 | Create types | — | ~10% |
|
|
139
|
+
| G2 | 2 | Create handler | G1 | ~20% |
|
|
140
|
+
| G3 | 2 | Create tests | G1 | ~15% |
|
|
141
|
+
| G4 | 3 | Wire integration | G2, G3 | ~10% |
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Execution Plan Format
|
|
145
|
+
|
|
146
|
+
```markdown
|
|
147
|
+
### Execution Plan
|
|
148
|
+
|
|
149
|
+
| Wave | Groups | Parallel? | Workers |
|
|
150
|
+
|------|--------|-----------|---------|
|
|
151
|
+
| 1 | G1 | No | 1 |
|
|
152
|
+
| 2 | G2, G3 | Yes | 2 |
|
|
153
|
+
| 3 | G4 | No | 1 |
|
|
154
|
+
|
|
155
|
+
**Total workers needed:** 2 (max in any wave)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
- **Parallel?**: "Yes" if wave has >1 group, "No" otherwise
|
|
159
|
+
- **Workers**: Count of groups in the wave
|
|
160
|
+
- **Total workers needed**: Maximum Workers value across all waves
|
|
161
|
+
|
|
162
|
+
**Threshold Note:**
|
|
163
|
+
- Child specs with <3 task groups: Implementation Tasks section is optional
|
|
164
|
+
- Child specs with 3+ task groups: Include Implementation Tasks with Wave column
|
|
165
|
+
|
|
115
166
|
## Step 7: Archive Parent
|
|
116
167
|
|
|
117
168
|
Move parent spec:
|
package/commands/sf/run.md
CHANGED
|
@@ -239,7 +239,13 @@ The agent will:
|
|
|
239
239
|
2. Create atomic commits
|
|
240
240
|
3. Handle deviations
|
|
241
241
|
4. Add Execution Summary to spec
|
|
242
|
-
5. Update STATE.md to "review"
|
|
242
|
+
5. Update STATE.md status to "review"
|
|
243
|
+
|
|
244
|
+
**After agent returns, verify STATE.md is correct:**
|
|
245
|
+
- Active Specification must still be the SAME spec (not advanced to next)
|
|
246
|
+
- Status must be "review" (not "done" or "draft")
|
|
247
|
+
- Spec must still be in Queue (not moved to Completed)
|
|
248
|
+
- If agent over-advanced the state, revert to: active=SPEC-XXX, status=review, next=/sf:review
|
|
243
249
|
|
|
244
250
|
## Step 9.5: Check STATE.md Size and Rotate if Needed
|
|
245
251
|
|