specflow-cc 1.10.0 → 1.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,51 @@ All notable changes to SpecFlow will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.11.1] - 2026-02-10
9
+
10
+ ### Fixed
11
+
12
+ - **Orchestrator premature state advancement** — executor agents could advance STATE.md beyond "review", skipping the review step entirely
13
+ - After `/sf:run` completion, the orchestrator would sometimes perform `/sf:done` logic (moving spec to Completed, activating next spec)
14
+ - Root cause: vague instructions in STATE.md update step allowed LLM agents to over-interpret "update STATE.md"
15
+ - Added explicit boundary instructions ("DO NOT move to Completed, DO NOT activate next spec") to all three executor agents
16
+ - Added post-execution state verification guard in `/sf:run` command handler
17
+ - Affected files: `sf-spec-executor-orchestrator.md`, `spec-executor-orchestrator.md`, `spec-executor.md`, `run.md`
18
+
19
+ ---
20
+
21
+ ## [1.11.0] - 2026-02-06
22
+
23
+ ### Added
24
+
25
+ - **Self-check verification** — all executor agents now verify their own claims before reporting completion
26
+ - `spec-executor`: checks created files exist on disk, commits in git, modified files contain expected changes
27
+ - `spec-executor-worker`: self-check step before returning JSON results; new `self_check` field in response protocol
28
+ - Both orchestrators: aggregated self-check verifying all worker claims against reality
29
+ - Agents refuse to report success if artifacts are missing
30
+
31
+ - **Segmented execution** — large task groups automatically split into sequential segments with fresh context
32
+ - Orchestrators evaluate segmentation threshold (Est. Context >= 20%)
33
+ - Each segment runs in a fresh worker subagent to prevent quality degradation
34
+ - Handoff summaries pass key exports and interface signatures between segments
35
+ - Segment results aggregated into single group result for downstream processing
36
+ - Segment failure handling: abort remaining segments on failure, continue on partial
37
+
38
+ - **Wave column in spec creation** — `spec-creator` and `spec-splitter` now always generate a `Wave` column in Implementation Tasks tables
39
+ - Pre-computed wave numbers during spec creation (not during execution)
40
+ - Orchestrators read wave numbers directly instead of computing dependency graphs
41
+ - Fallback for legacy specs without Wave column preserved
42
+
43
+ ### Changed
44
+
45
+ - **Enhanced deviation rules** — all executor agents now include detailed examples, rule priority order, and edge case guidance
46
+ - Rule priority: Rule 4 (architectural) overrides all; Rules 1-3 auto-fix; unsure defaults to Rule 4
47
+ - Standardized tracking format: `[Rule N - Type] {description}`
48
+
49
+ - **Auditor segment hints** — `spec-auditor` can now flag task groups that should be pre-segmented based on estimated context
50
+
51
+ ---
52
+
8
53
  ## [1.10.0] - 2026-02-06
9
54
 
10
55
  ### Added
package/README.md CHANGED
@@ -17,7 +17,7 @@ npx specflow-cc --global
17
17
 
18
18
  <br>
19
19
 
20
- https://github.com/user-attachments/assets/3f516907-8657-4ea4-bc0c-6319998a09db
20
+ https://github.com/user-attachments/assets/23415009-81f9-4755-9e35-32e8bc56b8e7
21
21
 
22
22
  <br>
23
23
 
@@ -413,11 +413,12 @@ Control cost vs quality:
413
413
 
414
414
  | Profile | Spec Creation | Execution | Review |
415
415
  |---------|---------------|-----------|--------|
416
+ | `max` | Opus | Opus | Opus |
416
417
  | `quality` | Opus | Opus | Sonnet |
417
418
  | `balanced` | Opus | Sonnet | Sonnet |
418
419
  | `budget` | Sonnet | Sonnet | Haiku |
419
420
 
420
- Use `quality` for critical features, `budget` for routine tasks.
421
+ Use `max` for maximum quality everywhere, `quality` for critical features, `budget` for routine tasks.
421
422
 
422
423
  ---
423
424
 
@@ -142,8 +142,49 @@ Waves:
142
142
 
143
143
  For each wave:
144
144
 
145
+ ### 3.05 Evaluate Segmentation
146
+
147
+ For each task group in the current wave, check if segmentation is needed.
148
+
149
+ **Segmentation threshold:** Est. Context >= 20%
150
+
151
+ **Decision logic:**
152
+
153
+ | Est. Context | Segment Count | Rationale |
154
+ |--------------|---------------|-----------|
155
+ | < 20% | 1 (no segmentation) | Fits comfortably in fresh context |
156
+ | 20-35% | 2 segments | Split to keep each segment in PEAK range |
157
+ | 35-50% | 3 segments | Three-way split for larger groups |
158
+ | > 50% | 4 segments (default) + warning | Group should have been split by auditor; flag as warning but proceed with 4-way split |
159
+
160
+ **How to determine segment boundaries:**
161
+
162
+ Parse the task group's task list and divide at natural boundaries:
163
+ 1. File boundaries (each segment handles a subset of files)
164
+ 2. Logical unit boundaries (types first, then implementations, then wiring)
165
+ 3. If tasks are numbered (T1, T2, T3...), divide the task numbers evenly
166
+
167
+ **Segment plan format:**
168
+
169
+ For each segmented group, create a segment plan:
170
+
171
+ | Segment | Tasks | Files | Est. Context |
172
+ |---------|-------|-------|--------------|
173
+ | G2-S1 | Create types, Create handler-a | types.ts, handler-a.ts | ~12% |
174
+ | G2-S2 | Create handler-b, Create tests | handler-b.ts, tests.ts | ~13% |
175
+
176
+ **Pre-computed segments from auditor:**
177
+
178
+ If the Implementation Tasks table includes a `Segments` column, use those segment boundaries instead of computing them at runtime.
179
+
145
180
  ### 3.1 Spawn Workers
146
181
 
182
+ For each task group in the current wave:
183
+
184
+ **If group is NOT segmented (standard path):**
185
+
186
+ Spawn worker as today (unchanged).
187
+
147
188
  **Parallel (preferred):**
148
189
  ```
149
190
  Task(prompt="<task_group>G2: Create handler-a</task_group>
@@ -161,6 +202,104 @@ Task(prompt="...G4...", subagent_type="sf-spec-executor-worker", model="{profile
161
202
 
162
203
  **Sequential fallback:** If parallel fails, execute one at a time.
163
204
 
205
+ **If group IS segmented:**
206
+
207
+ Execute segments sequentially within the group, each in a fresh worker:
208
+
209
+ Segment 1:
210
+ ```
211
+ Task(prompt="<task_group>G2-S1: Create types and handler-a</task_group>
212
+ <segment_info>
213
+ Segment 1 of 2 for group G2.
214
+ This is the FIRST segment. No prior work exists.
215
+ </segment_info>
216
+ <requirements>{G2-S1 requirements}</requirements>
217
+ <project_patterns>@.specflow/PROJECT.md</project_patterns>
218
+ <context_budget>
219
+ Estimated: ~12%
220
+ Target max: 25%
221
+ </context_budget>
222
+ Implement this segment. Create atomic commits.
223
+ Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
224
+ ", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 1/2")
225
+ ```
226
+
227
+ Wait for Segment 1 result, then:
228
+
229
+ Segment 2:
230
+ ```
231
+ Task(prompt="<task_group>G2-S2: Create handler-b and tests</task_group>
232
+ <segment_info>
233
+ Segment 2 of 2 for group G2.
234
+ Prior segment completed. Summary of prior work:
235
+ </segment_info>
236
+ <prior_segment_summary>
237
+ ## Completed Segments
238
+
239
+ ### Segment 1 of N
240
+ **Status:** complete
241
+ **Files created:**
242
+ - `path/to/file1.ts` -- brief description (key exports: X, Y)
243
+ - `path/to/file2.ts` -- brief description (key exports: Z)
244
+
245
+ **Files modified:**
246
+ - `path/to/existing.ts` -- what changed
247
+
248
+ **Commits:** hash1, hash2
249
+
250
+ **Key interfaces/types defined:**
251
+ - InterfaceName: { field1: type, field2: type }
252
+ - TypeName: description
253
+ </prior_segment_summary>
254
+ <requirements>{G2-S2 requirements}</requirements>
255
+ <project_patterns>@.specflow/PROJECT.md</project_patterns>
256
+ <context_budget>
257
+ Estimated: ~13%
258
+ Target max: 25%
259
+ </context_budget>
260
+ Implement this segment. Create atomic commits.
261
+ You can reference files created by prior segments but do NOT re-read them unless you need specific details.
262
+ Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
263
+ ", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 2/2")
264
+ ```
265
+
266
+ **Important:** Segments within a group are ALWAYS sequential (never parallel) because later segments depend on earlier ones.
267
+
268
+ **Parallel behavior:** Non-segmented groups in the same wave still run in parallel alongside segmented groups. The segmented group's sequential segments run independently of other groups.
269
+
270
+ ### 3.15 Aggregate Segment Results
271
+
272
+ After all segments for a group complete, merge results into a single group result:
273
+
274
+ ```json
275
+ {
276
+ "group": "G2",
277
+ "status": "{worst status among segments: failed > partial > complete}",
278
+ "files_created": ["{union of all segments' files_created}"],
279
+ "files_modified": ["{union of all segments' files_modified}"],
280
+ "commits": ["{concatenation of all segments' commits in order}"],
281
+ "criteria_met": ["{union of all segments' criteria_met}"],
282
+ "deviations": ["{concatenation of all segments' deviations}"],
283
+ "error": "{first non-null error, or null}",
284
+ "segmented": true,
285
+ "segment_count": 2,
286
+ "segment_results": [
287
+ {"segment": 1, "status": "complete", ...},
288
+ {"segment": 2, "status": "complete", ...}
289
+ ]
290
+ }
291
+ ```
292
+
293
+ This aggregated result feeds into the existing Step 3.2 (Collect Results) and Step 3.3 (Handle Failures) unchanged.
294
+
295
+ **Segment failure handling:**
296
+
297
+ | Scenario | Action |
298
+ |----------|--------|
299
+ | Segment N fails | Abort remaining segments for this group, mark group as failed |
300
+ | Segment N partial | Continue to next segment with available results, mark group as partial |
301
+ | All segments complete | Aggregate into single group result, mark group as complete |
302
+
164
303
  ### 3.2 Collect Results
165
304
 
166
305
  Parse each worker's JSON response.
@@ -184,7 +323,35 @@ Criteria met: [union of all criteria_met]
184
323
  Deviations: [collect all deviations]
185
324
  ```
186
325
 
187
- ## Step 6: Create Final Summary
326
+ ## Step 6: Aggregated Self-Check
327
+
328
+ After aggregating results, verify that all worker claims are real.
329
+
330
+ **1. Check all created files exist:**
331
+
332
+ For each file in the aggregated `files_created` list:
333
+ ```bash
334
+ [ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
335
+ ```
336
+
337
+ **2. Check all commits exist:**
338
+
339
+ For each commit hash in the aggregated `commits` list:
340
+ ```bash
341
+ git log --oneline -20 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
342
+ ```
343
+
344
+ **3. Check worker self_check fields:**
345
+
346
+ If any worker returned `self_check: "partial"` or `self_check: "failed"`, flag those groups for investigation.
347
+
348
+ **4. Handle failures:**
349
+
350
+ - If missing files/commits found: report discrepancies in Execution Summary
351
+ - Do NOT report success with missing artifacts
352
+ - If critical files missing: mark affected groups as `partial`
353
+
354
+ ## Step 7: Create Final Summary
188
355
 
189
356
  Append Execution Summary to specification:
190
357
 
@@ -216,11 +383,19 @@ Append Execution Summary to specification:
216
383
  {aggregated deviations}
217
384
  ```
218
385
 
219
- ## Step 7: Update STATE.md
386
+ ## Step 8: Update STATE.md
220
387
 
388
+ Update ONLY the Current Position section:
221
389
  - Status → "review"
222
390
  - Next Step → "/sf:review"
223
391
 
392
+ **CRITICAL — DO NOT go beyond this:**
393
+ - Do NOT move the spec to Completed Specifications table
394
+ - Do NOT remove the spec from Queue table
395
+ - Do NOT activate the next specification in the queue
396
+ - Do NOT archive the spec file
397
+ - These actions belong to `/sf:done`, not to execution
398
+
224
399
  </process>
225
400
 
226
401
  <output>
@@ -276,6 +451,7 @@ Output directly as formatted text (not wrapped in a code block):
276
451
  - [ ] Each worker receives no more than 3 task groups
277
452
  - [ ] All worker results collected and parsed
278
453
  - [ ] Failures handled per failure handling rules
454
+ - [ ] Aggregated self-check passed (all files and commits verified)
279
455
  - [ ] Results aggregated into final summary
280
456
  - [ ] Execution Summary appended to specification
281
457
  - [ ] STATE.md updated to "review"
@@ -467,6 +467,38 @@ If the spec has a Goal Analysis section, enhance Implementation Tasks:
467
467
 
468
468
  Note: "Enables Truths" column is only added when Goal Analysis is present.
469
469
 
470
+ ### 4.45 Segment Hints for Large Groups
471
+
472
+ After generating task groups, check if any single group has Est. Context >= 20%.
473
+
474
+ If so, add a `Segments` column to the Implementation Tasks table:
475
+
476
+ | Group | Wave | Tasks | Dependencies | Est. Context | Segments |
477
+ |-------|------|-------|--------------|--------------|----------|
478
+ | G1 | 1 | Create types | - | ~8% | 1 |
479
+ | G2 | 2 | Create handlers, tests, validation | G1 | ~28% | 2 |
480
+ | G3 | 2 | Create UI components | G1 | ~10% | 1 |
481
+
482
+ For groups with Segments > 1, add segment breakdown in the Execution Plan:
483
+
484
+ **G2 Segments:**
485
+ - S1: Create handlers (handler-a.ts, handler-b.ts) -- ~14%
486
+ - S2: Create tests and validation (handler.test.ts, validation.ts) -- ~14%
487
+
488
+ **Segment count guidance:**
489
+
490
+ | Est. Context | Segment Count |
491
+ |--------------|---------------|
492
+ | < 20% | 1 (no segmentation) |
493
+ | 20-35% | 2 segments |
494
+ | 35-50% | 3 segments |
495
+ | > 50% | 4 segments (with warning: consider splitting into separate task groups) |
496
+
497
+ **Segment boundaries should follow natural divisions:**
498
+ 1. File boundaries (each segment handles subset of files)
499
+ 2. Logical unit boundaries (types first, then implementations, then wiring)
500
+ 3. Sequential task ordering (T1-T3 in S1, T4-T6 in S2)
501
+
470
502
  ## Step 4.5: Compute Execution Waves
471
503
 
472
504
  After generating task groups (or for any spec with Implementation Tasks):
@@ -130,6 +130,74 @@ Write to `.specflow/specs/SPEC-XXX.md` using the template structure:
130
130
  8. **Assumptions:** What you assumed (clearly marked)
131
131
  - **If `<prior_discussion>` provided:** Decisions from discussion are facts, not assumptions
132
132
 
133
+ ## Step 5.5: Generate Implementation Tasks (for medium and large specs)
134
+
135
+ **When to include:**
136
+ - **Medium** and **large** complexity specs: Always include Implementation Tasks section
137
+ - **Small** complexity specs: Optional (skip if only 1-2 files or simple change)
138
+
139
+ **Task Groups:**
140
+
141
+ 1. Group related work logically:
142
+ - Types/interfaces first (foundational)
143
+ - Independent implementations (can run parallel)
144
+ - Integration/wiring last (depends on implementations)
145
+
146
+ 2. For each group, define:
147
+ - **Group ID**: G1, G2, G3, etc.
148
+ - **Tasks**: Brief description of what the group does
149
+ - **Dependencies**: Which groups must complete first (use `—` for none)
150
+ - **Est. Context**: Rough estimate (e.g., ~15%, ~20%)
151
+
152
+ **Wave Assignment Algorithm:**
153
+
154
+ Assign wave numbers to enable parallel execution:
155
+
156
+ 1. Initialize all groups with wave = 0 (unassigned)
157
+ 2. For each group with no dependencies: wave = 1
158
+ 3. Repeat until all groups have waves:
159
+ - For each unassigned group:
160
+ - If all dependencies have assigned waves:
161
+ - wave = max(dependency waves) + 1
162
+ 4. If groups remain unassigned after a full pass with no progress:
163
+ - Circular dependency exists
164
+ - Flag in spec as note: "Note: Circular dependency detected in groups [list]. Auditor will verify."
165
+
166
+ **Implementation Tasks Table:**
167
+
168
+ Generate the table with Wave column:
169
+
170
+ ```markdown
171
+ ### Task Groups
172
+
173
+ | Group | Wave | Tasks | Dependencies | Est. Context |
174
+ |-------|------|-------|--------------|--------------|
175
+ | G1 | 1 | Create types | — | ~10% |
176
+ | G2 | 2 | Create handler | G1 | ~20% |
177
+ | G3 | 2 | Create tests | G1 | ~15% |
178
+ | G4 | 3 | Wire integration | G2, G3 | ~10% |
179
+ ```
180
+
181
+ **Execution Plan:**
182
+
183
+ Generate the Execution Plan summary showing parallel opportunities:
184
+
185
+ ```markdown
186
+ ### Execution Plan
187
+
188
+ | Wave | Groups | Parallel? | Workers |
189
+ |------|--------|-----------|---------|
190
+ | 1 | G1 | No | 1 |
191
+ | 2 | G2, G3 | Yes | 2 |
192
+ | 3 | G4 | No | 1 |
193
+
194
+ **Total workers needed:** 2 (max in any wave)
195
+ ```
196
+
197
+ - **Parallel?**: "Yes" if wave has >1 group, "No" otherwise
198
+ - **Workers**: Count of groups in the wave
199
+ - **Total workers needed**: Maximum Workers value across all waves
200
+
133
201
  ## Step 6: Estimate Complexity
134
202
 
135
203
  Based on:
@@ -85,6 +85,7 @@ Worker returns structured JSON:
85
85
  "commits": ["abc123", "def456"],
86
86
  "criteria_met": ["Criterion 1", "Criterion 2"],
87
87
  "deviations": [],
88
+ "self_check": "passed|partial|skipped",
88
89
  "error": null
89
90
  }
90
91
  ```
@@ -138,6 +139,37 @@ This helps workers make trade-off decisions:
138
139
  "error": null
139
140
  }
140
141
  }
142
+ },
143
+ {
144
+ "id": 2,
145
+ "status": "in_progress",
146
+ "results": {
147
+ "G2": {
148
+ "status": "in_progress",
149
+ "segmented": true,
150
+ "segment_count": 2,
151
+ "segments": [
152
+ {
153
+ "segment": 1,
154
+ "status": "complete",
155
+ "commits": ["abc123"],
156
+ "files_created": ["types.ts"],
157
+ "handoff_summary": "..."
158
+ },
159
+ {
160
+ "segment": 2,
161
+ "status": "running",
162
+ "commits": [],
163
+ "files_created": []
164
+ }
165
+ ]
166
+ },
167
+ "G3": {
168
+ "status": "complete",
169
+ "segmented": false,
170
+ "commits": ["def456"]
171
+ }
172
+ }
141
173
  }
142
174
  ],
143
175
  "commits": ["all", "commit", "hashes"],
@@ -320,8 +352,49 @@ Verify prerequisites:
320
352
  2. "Continue anyway" -> proceed despite issues
321
353
  3. "Abort" -> stop execution, preserve state
322
354
 
355
+ ### 3.05 Evaluate Segmentation
356
+
357
+ For each task group in the current wave, check if segmentation is needed.
358
+
359
+ **Segmentation threshold:** Est. Context >= 20%
360
+
361
+ **Decision logic:**
362
+
363
+ | Est. Context | Segment Count | Rationale |
364
+ |--------------|---------------|-----------|
365
+ | < 20% | 1 (no segmentation) | Fits comfortably in fresh context |
366
+ | 20-35% | 2 segments | Split to keep each segment in PEAK range |
367
+ | 35-50% | 3 segments | Three-way split for larger groups |
368
+ | > 50% | 4 segments (default) + warning | Group should have been split by auditor; flag as warning but proceed with 4-way split |
369
+
370
+ **How to determine segment boundaries:**
371
+
372
+ Parse the task group's task list and divide at natural boundaries:
373
+ 1. File boundaries (each segment handles a subset of files)
374
+ 2. Logical unit boundaries (types first, then implementations, then wiring)
375
+ 3. If tasks are numbered (T1, T2, T3...), divide the task numbers evenly
376
+
377
+ **Segment plan format:**
378
+
379
+ For each segmented group, create a segment plan:
380
+
381
+ | Segment | Tasks | Files | Est. Context |
382
+ |---------|-------|-------|--------------|
383
+ | G2-S1 | Create types, Create handler-a | types.ts, handler-a.ts | ~12% |
384
+ | G2-S2 | Create handler-b, Create tests | handler-b.ts, tests.ts | ~13% |
385
+
386
+ **Pre-computed segments from auditor:**
387
+
388
+ If the Implementation Tasks table includes a `Segments` column, use those segment boundaries instead of computing them at runtime.
389
+
323
390
  ### 3.1 Spawn Workers
324
391
 
392
+ For each task group in the current wave:
393
+
394
+ **If group is NOT segmented (standard path):**
395
+
396
+ Spawn worker as today (unchanged).
397
+
325
398
  **Parallel (preferred):**
326
399
  ```
327
400
  Task(prompt="<task_group>G2: Create handler-a</task_group>
@@ -344,6 +417,104 @@ Task(prompt="...G4 (with context_budget)...", subagent_type="sf-spec-executor-wo
344
417
 
345
418
  **Sequential fallback:** If parallel fails, execute one at a time.
346
419
 
420
+ **If group IS segmented:**
421
+
422
+ Execute segments sequentially within the group, each in a fresh worker:
423
+
424
+ Segment 1:
425
+ ```
426
+ Task(prompt="<task_group>G2-S1: Create types and handler-a</task_group>
427
+ <segment_info>
428
+ Segment 1 of 2 for group G2.
429
+ This is the FIRST segment. No prior work exists.
430
+ </segment_info>
431
+ <requirements>{G2-S1 requirements}</requirements>
432
+ <project_patterns>@.specflow/PROJECT.md</project_patterns>
433
+ <context_budget>
434
+ Estimated: ~12%
435
+ Target max: 25%
436
+ </context_budget>
437
+ Implement this segment. Create atomic commits.
438
+ Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
439
+ ", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 1/2")
440
+ ```
441
+
442
+ Wait for Segment 1 result, then:
443
+
444
+ Segment 2:
445
+ ```
446
+ Task(prompt="<task_group>G2-S2: Create handler-b and tests</task_group>
447
+ <segment_info>
448
+ Segment 2 of 2 for group G2.
449
+ Prior segment completed. Summary of prior work:
450
+ </segment_info>
451
+ <prior_segment_summary>
452
+ ## Completed Segments
453
+
454
+ ### Segment 1 of N
455
+ **Status:** complete
456
+ **Files created:**
457
+ - `path/to/file1.ts` -- brief description (key exports: X, Y)
458
+ - `path/to/file2.ts` -- brief description (key exports: Z)
459
+
460
+ **Files modified:**
461
+ - `path/to/existing.ts` -- what changed
462
+
463
+ **Commits:** hash1, hash2
464
+
465
+ **Key interfaces/types defined:**
466
+ - InterfaceName: { field1: type, field2: type }
467
+ - TypeName: description
468
+ </prior_segment_summary>
469
+ <requirements>{G2-S2 requirements}</requirements>
470
+ <project_patterns>@.specflow/PROJECT.md</project_patterns>
471
+ <context_budget>
472
+ Estimated: ~13%
473
+ Target max: 25%
474
+ </context_budget>
475
+ Implement this segment. Create atomic commits.
476
+ You can reference files created by prior segments but do NOT re-read them unless you need specific details.
477
+ Return JSON: {group, segment, status, files_created, files_modified, commits, criteria_met, deviations, error}
478
+ ", subagent_type="sf-spec-executor-worker", model="{profile_model}", description="Execute G2 segment 2/2")
479
+ ```
480
+
481
+ **Important:** Segments within a group are ALWAYS sequential (never parallel) because later segments depend on earlier ones.
482
+
483
+ **Parallel behavior:** Non-segmented groups in the same wave still run in parallel alongside segmented groups. The segmented group's sequential segments run independently of other groups.
484
+
485
+ ### 3.15 Aggregate Segment Results
486
+
487
+ After all segments for a group complete, merge results into a single group result:
488
+
489
+ ```json
490
+ {
491
+ "group": "G2",
492
+ "status": "{worst status among segments: failed > partial > complete}",
493
+ "files_created": ["{union of all segments' files_created}"],
494
+ "files_modified": ["{union of all segments' files_modified}"],
495
+ "commits": ["{concatenation of all segments' commits in order}"],
496
+ "criteria_met": ["{union of all segments' criteria_met}"],
497
+ "deviations": ["{concatenation of all segments' deviations}"],
498
+ "error": "{first non-null error, or null}",
499
+ "segmented": true,
500
+ "segment_count": 2,
501
+ "segment_results": [
502
+ {"segment": 1, "status": "complete", ...},
503
+ {"segment": 2, "status": "complete", ...}
504
+ ]
505
+ }
506
+ ```
507
+
508
+ This aggregated result feeds into the existing Step 3.2 (Collect Results) and Step 3.3 (Update State Per Worker) unchanged.
509
+
510
+ **Segment failure handling:**
511
+
512
+ | Scenario | Action |
513
+ |----------|--------|
514
+ | Segment N fails | Abort remaining segments for this group, mark group as failed |
515
+ | Segment N partial | Continue to next segment with available results, mark group as partial |
516
+ | All segments complete | Aggregate into single group result, mark group as complete |
517
+
347
518
  ### 3.2 Collect Results
348
519
 
349
520
  Parse each worker's JSON response.
@@ -438,6 +609,36 @@ Criteria met: [union of all criteria_met]
438
609
  Deviations: [collect all deviations]
439
610
  ```
440
611
 
612
+ ## Step 4.5: Final Aggregated Self-Check
613
+
614
+ After aggregating all results, verify claims against reality.
615
+
616
+ **1. Check all created files exist:**
617
+
618
+ For each file in the aggregated `files_created` list:
619
+ ```bash
620
+ [ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
621
+ ```
622
+
623
+ **2. Check all commits exist:**
624
+
625
+ For each commit hash in the aggregated `commits` list:
626
+ ```bash
627
+ git log --oneline -30 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
628
+ ```
629
+
630
+ **3. Check worker self_check fields:**
631
+
632
+ If any worker returned `self_check: "partial"` or `self_check: "skipped"`, flag those groups.
633
+
634
+ **4. Handle discrepancies:**
635
+
636
+ - Missing files/commits → report in Execution Summary under "Self-Check Issues"
637
+ - Do NOT report full success with missing artifacts
638
+ - If critical files missing → mark affected groups as `partial` in state
639
+
640
+ **Do NOT skip this step.**
641
+
441
642
  ## Step 5: Create Final Summary
442
643
 
443
644
  Append Execution Summary to specification:
@@ -486,10 +687,18 @@ rm .specflow/execution/SPEC-XXX-state.json
486
687
 
487
688
  ## Step 7: Update STATE.md
488
689
 
489
- - Status -> "review"
490
- - Next Step -> "/sf:review"
690
+ Update ONLY the Current Position section:
691
+ - Status "review"
692
+ - Next Step → "/sf:review"
491
693
  - Remove or update Execution Status row
492
694
 
695
+ **CRITICAL — DO NOT go beyond this:**
696
+ - Do NOT move the spec to Completed Specifications table
697
+ - Do NOT remove the spec from Queue table
698
+ - Do NOT activate the next specification in the queue
699
+ - Do NOT archive the spec file
700
+ - These actions belong to `/sf:done`, not to execution
701
+
493
702
  </process>
494
703
 
495
704
  <output>
@@ -551,6 +760,7 @@ Output directly as formatted text (not wrapped in a code block):
551
760
  - [ ] State updated after each wave completes
552
761
  - [ ] Post-wave verification performed after each wave
553
762
  - [ ] Failures handled per failure handling rules
763
+ - [ ] Final aggregated self-check passed (all files and commits verified)
554
764
  - [ ] Results aggregated into final summary
555
765
  - [ ] State file deleted on successful completion
556
766
  - [ ] Execution Summary appended to specification
@@ -27,20 +27,28 @@ You receive ONLY your task group's requirements from the orchestrator.
27
27
 
28
28
  ## Deviation Rules (inherited from spec-executor)
29
29
 
30
+ Apply these rules automatically. Track all deviations for the result JSON.
31
+
30
32
  **Rule 1: Auto-fix bugs** (no permission needed)
31
33
  - Code doesn't work as intended → fix inline, continue
34
+ - Examples: wrong logic, type errors, null pointers, broken validation, security vulnerabilities
35
+ - Track as: `[Rule 1 - Bug] {description}`
32
36
 
33
37
  **Rule 2: Auto-add missing critical functionality** (no permission needed)
34
- - Missing essentials for correctness/security
35
- - No error handling, no input validation, no null checks add inline, continue
38
+ - Missing essentials for correctness/security → add inline, continue
39
+ - Examples: missing error handling, no input validation, missing null checks, no auth on protected routes
40
+ - Track as: `[Rule 2 - Missing Critical] {description}`
36
41
 
37
42
  **Rule 3: Auto-fix blocking issues** (no permission needed)
38
- - Prevents task completion (missing dependency, broken import, wrong types)
39
- - Fix and continue
43
+ - Prevents task completion fix and continue
44
+ - Examples: missing dependency, broken import paths, wrong types, build config error
45
+ - Track as: `[Rule 3 - Blocking] {description}`
40
46
 
41
47
  **Rule 4: Ask about architectural changes** (requires user decision)
42
- - Significant structural modifications needed
43
- - New database table, schema changes, framework switching STOP and ask user
48
+ - Significant structural modifications needed → STOP and ask user
49
+ - Examples: new database table, schema changes, framework switching, changing API contracts
50
+
51
+ **Rule Priority:** Rule 4 overrides all → Rules 1-3 auto-fix → unsure = Rule 4
44
52
 
45
53
  ## Atomic Commits
46
54
 
@@ -71,11 +79,19 @@ When writing or modifying code:
71
79
 
72
80
  From orchestrator prompt, extract:
73
81
  - Task group ID (e.g., "G2")
82
+ - **Segment info (if present):** segment number, total segments
83
+ - **Prior segment summary (if present):** files created, key exports, commits
74
84
  - Task description
75
- - Requirements for this group
85
+ - Requirements for this group/segment
76
86
  - Interfaces/types to use (from previous groups)
77
87
  - Project patterns reference
78
88
 
89
+ **If segment info is present:**
90
+ - This is a segmented execution
91
+ - Focus ONLY on tasks assigned to this segment
92
+ - Use prior segment summary to understand what already exists
93
+ - Do NOT re-read files from prior segments unless you need specific implementation details
94
+
79
95
  ## Step 2: Load Required Context
80
96
 
81
97
  Read ONLY what's needed:
@@ -129,10 +145,33 @@ Deviations:
129
145
  - [Rule 2 - Missing] Added {functionality} for {reason}
130
146
  ```
131
147
 
132
- ## Step 5: Return Results
148
+ ## Step 5: Self-Check (Verify Your Own Claims)
149
+
150
+ Before returning results, verify that your work actually exists.
151
+
152
+ **1. Check created files exist:**
153
+ ```bash
154
+ [ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
155
+ ```
156
+
157
+ **2. Check commits exist:**
158
+ ```bash
159
+ git log --oneline -10 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
160
+ ```
161
+
162
+ **3. If ANY check fails:**
163
+ - Fix the issue before returning results
164
+ - Do NOT return `status: "complete"` with missing artifacts
165
+ - If unfixable, return `status: "partial"` with error explaining what's missing
166
+
167
+ **Do NOT skip this step.**
168
+
169
+ ## Step 6: Return Results
133
170
 
134
171
  Output structured JSON for orchestrator:
135
172
 
173
+ **For non-segmented execution:**
174
+
136
175
  ```json
137
176
  {
138
177
  "group": "G2",
@@ -149,10 +188,45 @@ Output structured JSON for orchestrator:
149
188
  "handleQuerySub processes QUERY_SUB messages"
150
189
  ],
151
190
  "deviations": [],
191
+ "self_check": "passed",
152
192
  "error": null
153
193
  }
154
194
  ```
155
195
 
196
+ **For segmented execution, add segment fields:**
197
+
198
+ ```json
199
+ {
200
+ "group": "G2",
201
+ "segment": 1,
202
+ "segment_total": 2,
203
+ "status": "complete",
204
+ "files_created": ["path/to/types.ts", "path/to/handler-a.ts"],
205
+ "files_modified": [],
206
+ "commits": ["abc123", "def456"],
207
+ "criteria_met": ["Types defined", "HandlerA implemented"],
208
+ "deviations": [],
209
+ "self_check": "passed",
210
+ "error": null,
211
+ "handoff_summary": {
212
+ "key_exports": ["UserType", "ConfigType", "HandlerA"],
213
+ "interfaces": "UserType: { id: string, name: string }",
214
+ "notes": "HandlerA expects ConfigType in constructor"
215
+ }
216
+ }
217
+ ```
218
+
219
+ **The `handoff_summary` field** (for segmented execution only) contains:
220
+ - Key exports from created files
221
+ - Interface/type signatures that later segments will need
222
+ - Brief notes about design decisions or conventions established
223
+
224
+ **Rules for handoff summary:**
225
+ - Include file paths and key exports (not full file contents)
226
+ - Include interface/type signatures if they are needed by later segments
227
+ - Maximum ~500 words per segment summary
228
+ - Do NOT include implementation details, only the public API surface
229
+
156
230
  **Status values:**
157
231
  - `complete`: All tasks done successfully
158
232
  - `partial`: Some tasks done, others blocked
@@ -164,7 +238,7 @@ Output structured JSON for orchestrator:
164
238
 
165
239
  Return ONLY the structured JSON result. The orchestrator will parse this.
166
240
 
167
- **On success:**
241
+ **On success (non-segmented):**
168
242
  ```json
169
243
  {
170
244
  "group": "G2",
@@ -174,10 +248,33 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
174
248
  "commits": ["abc1234", "def5678"],
175
249
  "criteria_met": ["Criterion 1", "Criterion 2"],
176
250
  "deviations": [],
251
+ "self_check": "passed",
177
252
  "error": null
178
253
  }
179
254
  ```
180
255
 
256
+ **On success (segmented):**
257
+ ```json
258
+ {
259
+ "group": "G2",
260
+ "segment": 1,
261
+ "segment_total": 2,
262
+ "status": "complete",
263
+ "files_created": ["path/to/types.ts"],
264
+ "files_modified": [],
265
+ "commits": ["abc1234"],
266
+ "criteria_met": ["Types defined"],
267
+ "deviations": [],
268
+ "self_check": "passed",
269
+ "error": null,
270
+ "handoff_summary": {
271
+ "key_exports": ["UserType", "ConfigType"],
272
+ "interfaces": "UserType: { id: string, name: string }",
273
+ "notes": "Types exported from types.ts module"
274
+ }
275
+ }
276
+ ```
277
+
181
278
  **On partial completion:**
182
279
  ```json
183
280
  {
@@ -188,6 +285,7 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
188
285
  "commits": ["abc1234"],
189
286
  "criteria_met": ["Criterion 1"],
190
287
  "deviations": [],
288
+ "self_check": "partial",
191
289
  "error": "Could not complete task X: missing dependency Y"
192
290
  }
193
291
  ```
@@ -202,6 +300,7 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
202
300
  "commits": [],
203
301
  "criteria_met": [],
204
302
  "deviations": [],
303
+ "self_check": "skipped",
205
304
  "error": "Failed to implement: {reason}"
206
305
  }
207
306
  ```
@@ -214,6 +313,7 @@ Return ONLY the structured JSON result. The orchestrator will parse this.
214
313
  - [ ] All tasks in group implemented
215
314
  - [ ] Atomic commits created for each logical unit
216
315
  - [ ] Deviations documented (if any)
316
+ - [ ] Self-check passed (all files and commits verified)
217
317
  - [ ] Structured JSON result returned
218
318
  - [ ] Status reflects actual completion state
219
319
  </success_criteria>
@@ -28,26 +28,39 @@ The specification is your contract. Follow it exactly:
28
28
 
29
29
  ## Deviation Rules
30
30
 
31
- When reality doesn't match the plan:
31
+ When reality doesn't match the plan, apply these rules automatically. Track all deviations for the Execution Summary.
32
32
 
33
33
  **Rule 1: Auto-fix bugs** (no permission needed)
34
- - Code doesn't work as intended
35
- - Fix inline, continue
34
+ - Code doesn't work as intended → fix inline, continue
35
+ - Examples: wrong logic, type errors, null pointer exceptions, broken validation, security vulnerabilities (SQL injection, XSS), race conditions, memory leaks
36
+ - Track as: `[Rule 1 - Bug] {description}`
36
37
 
37
38
  **Rule 2: Auto-add missing critical functionality** (no permission needed)
38
- - Missing essentials for correctness/security
39
- - No error handling, no input validation, no null checks
40
- - Fix inline, continue
39
+ - Missing essentials for correctness/security → add inline, continue
40
+ - Examples: missing error handling (no try/catch), no input validation, missing null checks, no auth on protected routes, missing required indexes
41
+ - Critical = required for correct/secure operation. These are NOT "features" — they're correctness requirements
42
+ - Track as: `[Rule 2 - Missing Critical] {description}`
41
43
 
42
44
  **Rule 3: Auto-fix blocking issues** (no permission needed)
43
- - Something prevents task completion
44
- - Missing dependency, broken import, wrong types
45
- - Fix and continue
45
+ - Something prevents task completion → fix and continue
46
+ - Examples: missing dependency, broken import paths, wrong types blocking compilation, missing env variable, build config error, circular dependency
47
+ - Track as: `[Rule 3 - Blocking] {description}`
46
48
 
47
49
  **Rule 4: Ask about architectural changes** (requires user decision)
48
- - Significant structural modification required
49
- - New database table, schema changes, switching frameworks
50
- - STOP and ask user
50
+ - Significant structural modification required → STOP and ask user
51
+ - Examples: adding new database table, major schema changes, switching libraries/frameworks, changing API contracts, adding new infrastructure layer
52
+ - Present: what you found, proposed change, why needed, impact, alternatives
53
+
54
+ **Rule Priority** (when multiple could apply):
55
+ 1. If Rule 4 applies → STOP (architectural decision needed)
56
+ 2. If Rules 1-3 apply → fix automatically, track for summary
57
+ 3. If genuinely unsure → apply Rule 4 (safer to ask)
58
+
59
+ **Edge case guidance:**
60
+ - "This validation is missing" → Rule 2 (critical for security)
61
+ - "This crashes on null" → Rule 1 (bug)
62
+ - "Need to add a database table" → Rule 4 (architectural)
63
+ - "Need to add a column" → Rule 1 or 2 (depends on context)
51
64
 
52
65
  ## Atomic Commits
53
66
 
@@ -153,7 +166,39 @@ If any deviations occurred (Rules 1-3), document them:
153
166
  2. [Rule 2 - Missing] Added {functionality} for {reason}
154
167
  ```
155
168
 
156
- ## Step 7: Create Execution Summary
169
+ ## Step 7: Self-Check (Verify Your Own Claims)
170
+
171
+ After implementation, verify that your work actually exists before reporting completion.
172
+
173
+ **1. Check created files exist:**
174
+
175
+ For each file you claim to have created:
176
+ ```bash
177
+ [ -f "path/to/file" ] && echo "FOUND: path/to/file" || echo "MISSING: path/to/file"
178
+ ```
179
+
180
+ **2. Check commits exist:**
181
+
182
+ For each commit hash you recorded:
183
+ ```bash
184
+ git log --oneline -10 | grep -q "{hash}" && echo "FOUND: {hash}" || echo "MISSING: {hash}"
185
+ ```
186
+
187
+ **3. Check modified files have expected changes:**
188
+
189
+ For key modifications, verify the change is present:
190
+ ```bash
191
+ grep -q "expected_pattern" path/to/modified/file && echo "VERIFIED" || echo "NOT FOUND"
192
+ ```
193
+
194
+ **4. Report self-check result:**
195
+
196
+ - If ALL checks pass: continue to Execution Summary
197
+ - If ANY check fails: **fix the issue** before proceeding, do NOT report success with missing artifacts
198
+
199
+ **Do NOT skip this step. Do NOT report completion if self-check fails.**
200
+
201
+ ## Step 8: Create Execution Summary
157
202
 
158
203
  Append to specification:
159
204
 
@@ -186,11 +231,19 @@ Append to specification:
186
231
  {Any important implementation notes for reviewer}
187
232
  ```
188
233
 
189
- ## Step 8: Update STATE.md
234
+ ## Step 9: Update STATE.md
190
235
 
236
+ Update ONLY the Current Position section:
191
237
  - Status → "review"
192
238
  - Next Step → "/sf:review"
193
239
 
240
+ **CRITICAL — DO NOT go beyond this:**
241
+ - Do NOT move the spec to Completed Specifications table
242
+ - Do NOT remove the spec from Queue table
243
+ - Do NOT activate the next specification in the queue
244
+ - Do NOT archive the spec file
245
+ - These actions belong to `/sf:done`, not to execution
246
+
194
247
  </process>
195
248
 
196
249
  <output>
@@ -237,6 +290,7 @@ Output directly as formatted text (not wrapped in a code block):
237
290
  - [ ] All acceptance criteria addressed
238
291
  - [ ] Atomic commits created
239
292
  - [ ] Deviations documented
293
+ - [ ] Self-check passed (all files and commits verified)
240
294
  - [ ] Execution Summary added to spec
241
295
  - [ ] STATE.md updated
242
296
  </success_criteria>
@@ -112,6 +112,57 @@ After user approval, create each child spec:
112
112
  7. Copy applicable Constraints
113
113
  8. Note inherited Assumptions
114
114
 
115
+ **Implementation Tasks for Child Specs:**
116
+
117
+ When a child spec contains **3+ task groups**, include an Implementation Tasks section with Wave column:
118
+
119
+ ### Wave Assignment Algorithm
120
+
121
+ 1. Initialize all groups with wave = 0 (unassigned)
122
+ 2. For each group with no dependencies: wave = 1
123
+ 3. Repeat until all groups have waves:
124
+ - For each unassigned group:
125
+ - If all dependencies have assigned waves:
126
+ - wave = max(dependency waves) + 1
127
+ 4. If groups remain unassigned after a full pass with no progress:
128
+ - Circular dependency exists
129
+ - Flag in spec as note for auditor
130
+
131
+ ### Implementation Tasks Table Format
132
+
133
+ ```markdown
134
+ ### Task Groups
135
+
136
+ | Group | Wave | Tasks | Dependencies | Est. Context |
137
+ |-------|------|-------|--------------|--------------|
138
+ | G1 | 1 | Create types | — | ~10% |
139
+ | G2 | 2 | Create handler | G1 | ~20% |
140
+ | G3 | 2 | Create tests | G1 | ~15% |
141
+ | G4 | 3 | Wire integration | G2, G3 | ~10% |
142
+ ```
143
+
144
+ ### Execution Plan Format
145
+
146
+ ```markdown
147
+ ### Execution Plan
148
+
149
+ | Wave | Groups | Parallel? | Workers |
150
+ |------|--------|-----------|---------|
151
+ | 1 | G1 | No | 1 |
152
+ | 2 | G2, G3 | Yes | 2 |
153
+ | 3 | G4 | No | 1 |
154
+
155
+ **Total workers needed:** 2 (max in any wave)
156
+ ```
157
+
158
+ - **Parallel?**: "Yes" if wave has >1 group, "No" otherwise
159
+ - **Workers**: Count of groups in the wave
160
+ - **Total workers needed**: Maximum Workers value across all waves
161
+
162
+ **Threshold Note:**
163
+ - Child specs with <3 task groups: Implementation Tasks section is optional
164
+ - Child specs with 3+ task groups: Include Implementation Tasks with Wave column
165
+
115
166
  ## Step 7: Archive Parent
116
167
 
117
168
  Move parent spec:
@@ -239,7 +239,13 @@ The agent will:
239
239
  2. Create atomic commits
240
240
  3. Handle deviations
241
241
  4. Add Execution Summary to spec
242
- 5. Update STATE.md to "review"
242
+ 5. Update STATE.md status to "review"
243
+
244
+ **After agent returns, verify STATE.md is correct:**
245
+ - Active Specification must still be the SAME spec (not advanced to next)
246
+ - Status must be "review" (not "done" or "draft")
247
+ - Spec must still be in Queue (not moved to Completed)
248
+ - If agent over-advanced the state, revert to: active=SPEC-XXX, status=review, next=/sf:review
243
249
 
244
250
  ## Step 9.5: Check STATE.md Size and Rotate if Needed
245
251
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "specflow-cc",
3
- "version": "1.10.0",
3
+ "version": "1.11.1",
4
4
  "description": "Spec-driven development system for Claude Code — quality-first workflow with explicit audit cycles",
5
5
  "bin": {
6
6
  "specflow-cc": "bin/install.js"