specweave 1.0.74 → 1.0.76

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE CHANGED
@@ -1,6 +1,6 @@
1
1
  MIT License
2
2
 
3
- Copyright (c) 2025 Anton Abyzov
3
+ Copyright (c) 2026 Anton Abyzov
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "specweave",
3
- "version": "1.0.74",
3
+ "version": "1.0.76",
4
4
  "description": "Spec-driven development framework for Claude Code. AI-native workflow with living documentation, intelligent agents, and multilingual support (9 languages). Enterprise-grade traceability with permanent specs and temporary increments.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -33,8 +33,8 @@ description: Start autonomous execution session with stop hook integration. Work
33
33
 
34
34
  | Option | Description | Default |
35
35
  |--------|-------------|---------|
36
- | `--max-iterations N` | Maximum iterations before stopping | 500 |
37
- | `--max-hours N` | Maximum hours to run | 120 (5 days) |
36
+ | `--max-iterations N` | Maximum iterations (safety net, not primary stop) | **2500** (v2.3) |
37
+ | `--max-hours N` | Maximum hours to run | **600 hours** (25 days, v2.3) |
38
38
  | `--simple` | Pure Ralph mode (minimal context) | false |
39
39
  | `--dry-run` | Preview without starting | false |
40
40
  | `--all-backlog` | Process all backlog items | false |
@@ -42,6 +42,13 @@ description: Start autonomous execution session with stop hook integration. Work
42
42
  | `--no-increment`, `--no-inc` | Skip auto-creation (require existing increments) | false |
43
43
  | `--prompt "text"` | Analyze prompt and create increments (intelligent chunking) | None |
44
44
  | `--yes`, `-y` | Auto-approve increment plan (skip user approval) | false |
45
+ | `--tdd`, `--strict` | **NEW v2.2**: Enable TDD strict mode - ALL tests must pass | false |
46
+
47
+ :::warning v2.3 - Iteration limits are SAFETY NETS
48
+ The primary completion criteria is **tests passing + tasks complete**. Iteration limits (2500 iterations, 600 hours) are backup safety nets. Per the Ralph Wiggum pattern, completion should be detected through **external verification** (test results), not self-assessment.
49
+
50
+ **IMPORTANT: Stop hook runs PER AGENT** - Each spawned subagent gets its own hook invocation. Iteration count is shared via session file, reflecting main agent loops.
51
+ :::
45
52
 
46
53
  ## Intelligent Increment Creation (NEW!)
47
54
 
@@ -344,10 +351,59 @@ Pure Ralph Wiggum behavior:
344
351
 
345
352
  - **Human Gates**: Sensitive operations require approval
346
353
  - **Circuit Breakers**: External service failures handled gracefully
347
- - **Max Iterations**: Prevents runaway loops
348
- - **Max Hours**: Time boxing
354
+ - **Max Iterations**: Prevents runaway loops (2500 default)
355
+ - **Max Hours**: Time boxing (600 hours / 25 days default)
349
356
  - **stop_hook_active**: Prevents infinite continuation loops
350
357
 
358
+ ## 🔧 v2.3 Per-Agent Stop Hook Behavior (NEW!)
359
+
360
+ **CRITICAL: The stop hook runs PER AGENT, not globally!**
361
+
362
+ ### How It Works
363
+
364
+ ```
365
+ Main Agent (Claude Code)
366
+
367
+ ├── Stop hook invoked when main agent tries to exit
368
+
369
+ ├── Spawns Subagent A (Task tool)
370
+ │ └── Subagent A completes → returns to main agent
371
+ │ (NO stop hook for subagent exit by default)
372
+
373
+ ├── Spawns Subagent B (Task tool with stop_hooks enabled)
374
+ │ └── Stop hook CAN be invoked if configured
375
+
376
+ └── Main agent tries to exit → Stop hook invoked
377
+ ```
378
+
379
+ ### Key Implications
380
+
381
+ 1. **Iteration count = main agent loops**: When you see "Iteration 42/2500", that's 42 times the MAIN agent tried to exit, not subagent work.
382
+
383
+ 2. **Subagent work is "free"**: Spawning specialized agents (QA, Security, etc.) doesn't consume iterations from the main loop.
384
+
385
+ 3. **Shared session state**: All agents (main + sub) share the same `auto-session.json`, so task completion is tracked globally.
386
+
387
+ 4. **Test validation at main level**: The stop hook validates test results when the MAIN agent tries to complete, ensuring all subagent work is verified.
388
+
389
+ ### Configuration
390
+
391
+ To enable stop hooks for subagents (advanced):
392
+ ```typescript
393
+ // In Task tool call
394
+ {
395
+ "stop_hooks": true, // Enable stop hook for this subagent
396
+ "inherit_session": true // Share session state with parent
397
+ }
398
+ ```
399
+
400
+ ### Best Practices
401
+
402
+ - Let subagents do specialized work without worrying about iterations
403
+ - Main agent orchestrates and validates via stop hook
404
+ - Use `--max-iterations` as a safety net, not a target
405
+ - Primary completion = tests pass + tasks complete
406
+
351
407
  ## 🔧 v2.1 Reliability Improvements (NEW!)
352
408
 
353
409
  Auto mode v2.1 includes critical improvements for reliable long-running sessions:
@@ -491,6 +547,170 @@ All reliability events logged to `.specweave/logs/auto-iterations.log`:
491
547
  {"timestamp":"2026-01-02T08:03:00Z","event":"failure_classified","category":"transient"}
492
548
  ```
493
549
 
550
+ ## 🔧 v2.2 TDD Strict Mode & Stop Reason Tracking (NEW!)
551
+
552
+ ### TDD Strict Mode
553
+
554
+ **Enable TDD strict mode to enforce ALL tests passing before completion:**
555
+
556
+ ```bash
557
+ /sw:auto --tdd 0001-feature
558
+ # or
559
+ /sw:auto --strict 0001-feature
560
+ ```
561
+
562
+ **TDD Mode Requirements:**
563
+ - ALL unit tests must pass (0 failures)
564
+ - ALL E2E tests must pass
565
+ - Test execution must be detected in transcript
566
+ - At least 1 passing test required per completed task (suspicious if 0)
567
+
568
+ ### Per-Increment TDD Configuration (NEW v2.2)
569
+
570
+ **TDD mode can be configured at multiple levels with priority:**
571
+
572
+ 1. **Increment metadata.json** (highest priority)
573
+ 2. **Increment config.json**
574
+ 3. **spec.md frontmatter**
575
+ 4. **Session (`--tdd` flag)**
576
+ 5. **Global config.json** (lowest priority)
577
+
578
+ **Example: Enable TDD for a specific increment:**
579
+
580
+ ```json
581
+ // .specweave/increments/0001-feature/metadata.json
582
+ {
583
+ "tddMode": true,
584
+ "testMode": "tdd"
585
+ }
586
+ ```
587
+
588
+ **Or via spec.md frontmatter:**
589
+
590
+ ```markdown
591
+ ---
592
+ increment: 0001-feature
593
+ title: "Critical Payment Feature"
594
+ tdd: true
595
+ ---
596
+ ```
597
+
598
+ **Console output shows TDD source:**
599
+ ```
600
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
601
+ 🔄 AUTO MODE CONTINUING
602
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
603
+ 📋 STOP CRITERIA: 🔴 TDD MODE: ALL tests MUST pass
604
+ TDD Source: increment metadata.json
605
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
606
+ ```
607
+
608
+ **Global Configuration (`.specweave/config.json`):**
609
+ ```json
610
+ {
611
+ "testing": {
612
+ "defaultTestMode": "tdd", // "tdd", "test-first", or "test-after"
613
+ "coverageTargets": {
614
+ "unit": 85,
615
+ "integration": 80,
616
+ "e2e": 90
617
+ }
618
+ }
619
+ }
620
+ ```
621
+
622
+ ### Automatic Test Command Discovery (NEW v2.2)
623
+
624
+ **Auto mode now discovers and displays available test commands for your project:**
625
+
626
+ The stop hook scans for test frameworks and shows you exactly what commands to run:
627
+
628
+ ```
629
+ AVAILABLE TEST COMMANDS FOR THIS PROJECT:
630
+
631
+ Unit/Integration Tests:
632
+ • npm test (npm)
633
+ • npx vitest run (vitest)
634
+
635
+ E2E Tests:
636
+ • npx playwright test (playwright)
637
+
638
+ PRIORITY: Run ALL tests BEFORE marking tasks complete!
639
+ ```
640
+
641
+ **Supported frameworks detected automatically:**
642
+
643
+ | Framework | Detection Method |
644
+ |-----------|-----------------|
645
+ | npm scripts | `package.json` scripts.test |
646
+ | Vitest | `vitest.config.ts/js` or dependency |
647
+ | Jest | `jest.config.ts/js` or dependency |
648
+ | Playwright | `playwright.config.ts/js` |
649
+ | Cypress | `cypress.config.ts/js` or `/cypress` dir |
650
+ | Detox | `.detoxrc.js/json` or dependency |
651
+ | Pytest | `pytest.ini` or `pyproject.toml` |
652
+ | Go test | `go.mod` |
653
+ | Cargo test | `Cargo.toml` |
654
+ | Xcode | `*.xcodeproj` or `*.xcworkspace` |
655
+ | Swift test | `Package.swift` |
656
+ | Gradle | `build.gradle(.kts)` |
657
+ | Maestro | `maestro.yaml` or `.maestro/` |
658
+
659
+ ### Stop Reason Tracking
660
+
661
+ **v2.2 now logs EXACTLY why auto mode stops:**
662
+
663
+ All stop reasons logged to `.specweave/logs/auto-stop-reasons.log`:
664
+
665
+ ```json
666
+ {
667
+ "timestamp": "2026-01-02T08:00:00Z",
668
+ "sessionId": "auto-2026-01-02-abc123",
669
+ "reason": "All tasks completed, all tests passed (42 passed, 0 failed)",
670
+ "success": true,
671
+ "iteration": 15,
672
+ "increment": "0001-feature",
673
+ "testsRun": true,
674
+ "testsPassed": 42,
675
+ "testsFailed": 0
676
+ }
677
+ ```
678
+
679
+ **Stop reasons categorized:**
680
+ | Category | Success | Example |
681
+ |----------|---------|---------|
682
+ | `all_tasks_complete` | ✅ | All tests pass, all tasks done |
683
+ | `completion_promise` | ✅ | `<auto-complete>DONE</auto-complete>` detected |
684
+ | `max_iterations_reached` | ❌ | Safety limit hit (not ideal) |
685
+ | `max_hours_exceeded` | ❌ | Time limit hit |
686
+ | `test_failures_exhausted` | ❌ | 3 retry attempts failed |
687
+ | `external_failure` | ❌ | Environment/config issue |
688
+ | `human_gate_pending` | ⏸️ | Waiting for user approval |
689
+
690
+ ### Mobile App Testing Support
691
+
692
+ **For iOS/Android projects, auto mode detects:**
693
+
694
+ | Framework | Detection | Command |
695
+ |-----------|-----------|---------|
696
+ | Xcode (iOS) | `xcodebuild test` output | `xcodebuild -scheme X test` |
697
+ | Swift PM | `swift test` output | `swift test` |
698
+ | Detox (RN) | `detox test` output | `detox test -c ios.sim.debug` |
699
+ | Maestro | `maestro test` output | `maestro test flow.yaml` |
700
+ | Appium | Test framework output | Framework-specific |
701
+
702
+ **Best Practice for Mobile Apps:**
703
+ 1. Set up automated tests (XCTest, Detox, Maestro)
704
+ 2. Run tests as part of task completion
705
+ 3. Auto mode blocks until all mobile tests pass
706
+ 4. Use `--tdd` for strictest enforcement
707
+
708
+ **Example mobile test detection:**
709
+ ```
710
+ Executed 15 tests, with 0 failures (0 unexpected) in 12.345 seconds
711
+ ** TEST SUCCEEDED **
712
+ ```
713
+
494
714
  ## ♿ UI/UX Quality Gates (NEW!)
495
715
 
496
716
  Auto mode now includes comprehensive UI/UX quality gates that run automatically when E2E tests are detected.
@@ -12,6 +12,13 @@
12
12
  # - Extracts specific failure details for fix prompts
13
13
  # - Blocks on ANY test failure (not just >3)
14
14
  #
15
+ # IMPORTANT (v2.3): Stop hook runs PER AGENT
16
+ # - Each spawned subagent (Task tool) gets its own stop hook invocation
17
+ # - Iteration count is SHARED across all agents via session file
18
+ # - Parent agent's exit triggers hook, subagent exits do NOT by default
19
+ # - This means iteration count reflects MAIN agent loops, not subagent work
20
+ # - Subagents with stop hooks enabled will ALSO trigger this hook
21
+ #
15
22
  # Claude Code Stop Hook receives:
16
23
  # - stdin: JSON with transcript_path, stop_hook_active, etc.
17
24
  # - Expected output: JSON with decision (approve/block) and optional reason/systemMessage
@@ -195,6 +202,141 @@ DEFAULT_COMMAND_TIMEOUT=600 # 10 minutes default
195
202
  TEST_COMMAND_TIMEOUT=600 # 10 minutes for tests
196
203
  BUILD_COMMAND_TIMEOUT=1200 # 20 minutes for builds
197
204
 
205
+ # ============================================================================
206
+ # TDD MODE & STRICT TEST REQUIREMENTS (NEW - v2.2)
207
+ # When TDD mode is enabled, tests MUST be green before allowing completion
208
+ # ============================================================================
209
+
210
+ CONFIG_FILE="$PROJECT_ROOT/.specweave/config.json"
211
+
212
+ # Get TDD/testing mode from config
213
+ get_test_mode() {
214
+ if [ -f "$CONFIG_FILE" ]; then
215
+ local mode=$(jq -r '.testing.defaultTestMode // "test-after"' "$CONFIG_FILE" 2>/dev/null)
216
+ echo "$mode"
217
+ else
218
+ echo "test-after"
219
+ fi
220
+ }
221
+
222
+ # Get TDD mode from increment-specific config (NEW - v2.2)
223
+ # Priority: 1. Increment metadata.json 2. Increment config.json 3. Session 4. Global config
224
+ get_increment_tdd_mode() {
225
+ local increment_id="$1"
226
+ local increment_dir="$PROJECT_ROOT/.specweave/increments/$increment_id"
227
+
228
+ # 1. Check increment metadata.json first (highest priority)
229
+ local inc_metadata="$increment_dir/metadata.json"
230
+ if [ -f "$inc_metadata" ]; then
231
+ local inc_tdd=$(jq -r '.tddMode // null' "$inc_metadata" 2>/dev/null)
232
+ if [ "$inc_tdd" != "null" ] && [ -n "$inc_tdd" ]; then
233
+ echo "$inc_tdd"
234
+ return
235
+ fi
236
+
237
+ # Also check testMode field
238
+ local inc_test_mode=$(jq -r '.testMode // null' "$inc_metadata" 2>/dev/null)
239
+ if [ "$inc_test_mode" = "tdd" ] || [ "$inc_test_mode" = "test-first" ]; then
240
+ echo "true"
241
+ return
242
+ fi
243
+ fi
244
+
245
+ # 2. Check increment-specific config.json
246
+ local inc_config="$increment_dir/config.json"
247
+ if [ -f "$inc_config" ]; then
248
+ local inc_tdd=$(jq -r '.tddMode // null' "$inc_config" 2>/dev/null)
249
+ if [ "$inc_tdd" != "null" ] && [ -n "$inc_tdd" ]; then
250
+ echo "$inc_tdd"
251
+ return
252
+ fi
253
+
254
+ local inc_test_mode=$(jq -r '.testing.mode // null' "$inc_config" 2>/dev/null)
255
+ if [ "$inc_test_mode" = "tdd" ] || [ "$inc_test_mode" = "test-first" ]; then
256
+ echo "true"
257
+ return
258
+ fi
259
+ fi
260
+
261
+ # 3. Check spec.md frontmatter for tdd: true
262
+ local spec_file="$increment_dir/spec.md"
263
+ if [ -f "$spec_file" ]; then
264
+ # Extract YAML frontmatter and check for tdd flag
265
+ local spec_tdd=$(sed -n '/^---$/,/^---$/p' "$spec_file" 2>/dev/null | grep -E '^tdd:\s*true' | head -1)
266
+ if [ -n "$spec_tdd" ]; then
267
+ echo "true"
268
+ return
269
+ fi
270
+ fi
271
+
272
+ # Return empty - let caller check session/global
273
+ echo ""
274
+ }
275
+
276
+ # Get TDD strict mode from session or config
277
+ # Priority: Increment > Session > Global config
278
+ is_tdd_strict_mode() {
279
+ # 1. Check increment-specific config FIRST (NEW - v2.2)
280
+ if [ -n "$CURRENT_INCREMENT" ]; then
281
+ local inc_tdd=$(get_increment_tdd_mode "$CURRENT_INCREMENT")
282
+ if [ "$inc_tdd" = "true" ]; then
283
+ echo "true"
284
+ return
285
+ elif [ "$inc_tdd" = "false" ]; then
286
+ echo "false"
287
+ return
288
+ fi
289
+ fi
290
+
291
+ # 2. Check session (--tdd flag from setup-auto.sh)
292
+ local session_tdd=$(echo "$SESSION" 2>/dev/null | jq -r '.tddMode // false')
293
+ if [ "$session_tdd" = "true" ]; then
294
+ echo "true"
295
+ return
296
+ fi
297
+
298
+ # 3. Check global config
299
+ local config_mode=$(get_test_mode)
300
+ if [ "$config_mode" = "tdd" ] || [ "$config_mode" = "test-first" ]; then
301
+ echo "true"
302
+ return
303
+ fi
304
+
305
+ echo "false"
306
+ }
307
+
308
+ # Get coverage targets from config
309
+ get_coverage_targets() {
310
+ if [ -f "$CONFIG_FILE" ]; then
311
+ local unit=$(jq -r '.testing.coverageTargets.unit // 80' "$CONFIG_FILE" 2>/dev/null)
312
+ local integration=$(jq -r '.testing.coverageTargets.integration // 80' "$CONFIG_FILE" 2>/dev/null)
313
+ local e2e=$(jq -r '.testing.coverageTargets.e2e // 80' "$CONFIG_FILE" 2>/dev/null)
314
+ echo "{\"unit\":$unit,\"integration\":$integration,\"e2e\":$e2e}"
315
+ else
316
+ echo '{"unit":80,"integration":80,"e2e":80}'
317
+ fi
318
+ }
319
+
320
+ # ============================================================================
321
+ # STOP REASON TRACKING (NEW - v2.2)
322
+ # Clear logging of WHY auto mode stops
323
+ # ============================================================================
324
+
325
+ log_stop_reason() {
326
+ local reason="$1"
327
+ local details="$2"
328
+ local is_success="${3:-false}"
329
+
330
+ local timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)
331
+
332
+ # Log to iterations log
333
+ echo "{\"timestamp\":\"$timestamp\",\"event\":\"session_stop\",\"reason\":\"$reason\",\"details\":\"$details\",\"success\":$is_success,\"iteration\":${ITERATION:-0},\"increment\":\"${CURRENT_INCREMENT:-none}\"}" >> "$LOGS_DIR/auto-iterations.log"
334
+
335
+ # Also log to dedicated stop log
336
+ mkdir -p "$LOGS_DIR"
337
+ echo "{\"timestamp\":\"$timestamp\",\"sessionId\":\"${SESSION_ID:-unknown}\",\"reason\":\"$reason\",\"details\":\"$details\",\"success\":$is_success,\"iteration\":${ITERATION:-0},\"increment\":\"${CURRENT_INCREMENT:-none}\",\"testsRun\":${TESTS_RUN:-false},\"testsPassed\":${TESTS_PASSED:-0},\"testsFailed\":${TESTS_FAILED:-0}}" >> "$LOGS_DIR/auto-stop-reasons.log"
338
+ }
339
+
198
340
  # Get timeout for command type
199
341
  get_command_timeout() {
200
342
  local cmd="$1"
@@ -232,17 +374,67 @@ detect_command_timeout() {
232
374
  }
233
375
 
234
376
  # Helper: Output approve decision
377
+ # ALWAYS log why we're stopping for debugging
235
378
  approve() {
236
379
  local reason="${1:-Session complete}"
380
+ local is_success="${2:-false}"
381
+
382
+ # Log the stop reason
383
+ log_stop_reason "$reason" "approve_called" "$is_success"
384
+
385
+ # Display the stop reason prominently to STDERR (not stdout, which is for JSON)
386
+ {
387
+ echo ""
388
+ echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
389
+ echo "🛑 AUTO MODE STOPPING"
390
+ echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
391
+ echo "Reason: $reason"
392
+ echo "Iteration: ${ITERATION:-0}/${MAX_ITERATIONS:-100}"
393
+ [ -n "$CURRENT_INCREMENT" ] && echo "Increment: $CURRENT_INCREMENT"
394
+ [ "${TESTS_RUN:-false}" = "true" ] && echo "Tests: ${TESTS_PASSED:-0} passed, ${TESTS_FAILED:-0} failed"
395
+ echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
396
+ echo ""
397
+ } >&2
398
+
237
399
  echo "{\"decision\": \"approve\", \"reason\": \"$reason\"}"
238
400
  exit 0
239
401
  }
240
402
 
241
403
  # Helper: Output block decision with system message
242
404
  # Properly escapes JSON strings with newlines
405
+ # Also displays stop criteria prominently to stderr (NEW - v2.2, enhanced v2.3)
243
406
  block() {
244
407
  local reason="$1"
245
408
  local system_message="$2"
409
+
410
+ # Get current stop criteria for display
411
+ local tdd_mode=$(is_tdd_strict_mode)
412
+ local stop_criteria=""
413
+
414
+ # Build stop criteria message - MORE PROMINENT (v2.3)
415
+ if [ "$tdd_mode" = "true" ]; then
416
+ stop_criteria="🔴 TDD MODE: ALL tasks [x] + ALL tests GREEN (0 failures)"
417
+ else
418
+ stop_criteria="✅ ALL tasks [x] completed + tests passing"
419
+ fi
420
+
421
+ # Display stop criteria and continuation reason to STDERR (v2.3 enhanced)
422
+ {
423
+ echo ""
424
+ echo "╔══════════════════════════════════════════════════════════════╗"
425
+ echo "║ 🔄 AUTO MODE CONTINUING - Agent will keep working ║"
426
+ echo "╠══════════════════════════════════════════════════════════════╣"
427
+ echo "║ Reason: $(printf '%-50s' "$reason")║"
428
+ echo "║ Iteration: $(printf '%-47s' "${ITERATION:-0}/${MAX_ITERATIONS:-2500}")║"
429
+ [ -n "$CURRENT_INCREMENT" ] && echo "║ Increment: $(printf '%-47s' "$CURRENT_INCREMENT")║"
430
+ echo "╠══════════════════════════════════════════════════════════════╣"
431
+ echo "║ 🎯 STOP CONDITION: $stop_criteria"
432
+ [ "${TESTS_RUN:-false}" = "true" ] && echo "║ Tests: ${TESTS_PASSED:-0} passed, ${TESTS_FAILED:-0} failed"
433
+ [ "$tdd_mode" = "true" ] && echo "║ TDD Source: $(get_tdd_source)"
434
+ echo "╚══════════════════════════════════════════════════════════════╝"
435
+ echo ""
436
+ } >&2
437
+
246
438
  if [ -n "$system_message" ]; then
247
439
  # Escape special characters for JSON
248
440
  local escaped_message=$(echo "$system_message" | jq -Rs .)
@@ -253,6 +445,200 @@ block() {
253
445
  exit 0
254
446
  }
255
447
 
448
+ # Helper: Get source of TDD mode for debugging
449
+ get_tdd_source() {
450
+ if [ -n "$CURRENT_INCREMENT" ]; then
451
+ local inc_dir="$PROJECT_ROOT/.specweave/increments/$CURRENT_INCREMENT"
452
+
453
+ # Check increment metadata.json
454
+ if [ -f "$inc_dir/metadata.json" ]; then
455
+ local inc_tdd=$(jq -r '.tddMode // null' "$inc_dir/metadata.json" 2>/dev/null)
456
+ if [ "$inc_tdd" = "true" ] || [ "$inc_tdd" = "false" ]; then
457
+ echo "increment metadata.json"
458
+ return
459
+ fi
460
+ local inc_test_mode=$(jq -r '.testMode // null' "$inc_dir/metadata.json" 2>/dev/null)
461
+ if [ "$inc_test_mode" = "tdd" ] || [ "$inc_test_mode" = "test-first" ]; then
462
+ echo "increment metadata.json (testMode)"
463
+ return
464
+ fi
465
+ fi
466
+
467
+ # Check increment config.json
468
+ if [ -f "$inc_dir/config.json" ]; then
469
+ local inc_tdd=$(jq -r '.tddMode // null' "$inc_dir/config.json" 2>/dev/null)
470
+ if [ "$inc_tdd" = "true" ] || [ "$inc_tdd" = "false" ]; then
471
+ echo "increment config.json"
472
+ return
473
+ fi
474
+ fi
475
+
476
+ # Check spec.md frontmatter
477
+ if [ -f "$inc_dir/spec.md" ]; then
478
+ if sed -n '/^---$/,/^---$/p' "$inc_dir/spec.md" 2>/dev/null | grep -qE '^tdd:\s*true'; then
479
+ echo "spec.md frontmatter"
480
+ return
481
+ fi
482
+ fi
483
+ fi
484
+
485
+ # Check session
486
+ local session_tdd=$(echo "$SESSION" 2>/dev/null | jq -r '.tddMode // false')
487
+ if [ "$session_tdd" = "true" ]; then
488
+ echo "--tdd flag (session)"
489
+ return
490
+ fi
491
+
492
+ # Check global config
493
+ if [ -f "$CONFIG_FILE" ]; then
494
+ local config_mode=$(jq -r '.testing.defaultTestMode // "test-after"' "$CONFIG_FILE" 2>/dev/null)
495
+ if [ "$config_mode" = "tdd" ] || [ "$config_mode" = "test-first" ]; then
496
+ echo "global config.json"
497
+ return
498
+ fi
499
+ fi
500
+
501
+ echo "default (test-after)"
502
+ }
503
+
504
+ # ============================================================================
505
+ # TEST COMMAND DISCOVERY (NEW - v2.2)
506
+ # Auto-detect available test commands for each project/technology
507
+ # ============================================================================
508
+
509
+ # Discover test commands for the current project
510
+ # Returns JSON with available test commands for each technology
511
+ discover_test_commands() {
512
+ local test_commands="[]"
513
+
514
+ # Node.js/JavaScript/TypeScript projects
515
+ if [ -f "$PROJECT_ROOT/package.json" ]; then
516
+ local pkg_test=$(jq -r '.scripts.test // null' "$PROJECT_ROOT/package.json" 2>/dev/null)
517
+ local pkg_test_unit=$(jq -r '.scripts["test:unit"] // null' "$PROJECT_ROOT/package.json" 2>/dev/null)
518
+ local pkg_test_e2e=$(jq -r '.scripts["test:e2e"] // null' "$PROJECT_ROOT/package.json" 2>/dev/null)
519
+
520
+ if [ "$pkg_test" != "null" ] && [ -n "$pkg_test" ]; then
521
+ test_commands=$(echo "$test_commands" | jq --arg cmd "npm test" --arg type "unit" '. + [{"command": $cmd, "type": $type, "framework": "npm"}]')
522
+ fi
523
+ if [ "$pkg_test_unit" != "null" ] && [ -n "$pkg_test_unit" ]; then
524
+ test_commands=$(echo "$test_commands" | jq --arg cmd "npm run test:unit" --arg type "unit" '. + [{"command": $cmd, "type": $type, "framework": "npm"}]')
525
+ fi
526
+ if [ "$pkg_test_e2e" != "null" ] && [ -n "$pkg_test_e2e" ]; then
527
+ test_commands=$(echo "$test_commands" | jq --arg cmd "npm run test:e2e" --arg type "e2e" '. + [{"command": $cmd, "type": $type, "framework": "npm"}]')
528
+ fi
529
+
530
+ # Detect Vitest
531
+ if [ -f "$PROJECT_ROOT/vitest.config.ts" ] || [ -f "$PROJECT_ROOT/vitest.config.js" ] || grep -q '"vitest"' "$PROJECT_ROOT/package.json" 2>/dev/null; then
532
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "npx vitest run", "type": "unit", "framework": "vitest"}]')
533
+ fi
534
+
535
+ # Detect Jest
536
+ if [ -f "$PROJECT_ROOT/jest.config.js" ] || [ -f "$PROJECT_ROOT/jest.config.ts" ] || grep -q '"jest"' "$PROJECT_ROOT/package.json" 2>/dev/null; then
537
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "npx jest", "type": "unit", "framework": "jest"}]')
538
+ fi
539
+
540
+ # Detect Playwright
541
+ if [ -f "$PROJECT_ROOT/playwright.config.ts" ] || [ -f "$PROJECT_ROOT/playwright.config.js" ]; then
542
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "npx playwright test", "type": "e2e", "framework": "playwright"}]')
543
+ fi
544
+
545
+ # Detect Cypress
546
+ if [ -f "$PROJECT_ROOT/cypress.config.ts" ] || [ -f "$PROJECT_ROOT/cypress.config.js" ] || [ -d "$PROJECT_ROOT/cypress" ]; then
547
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "npx cypress run", "type": "e2e", "framework": "cypress"}]')
548
+ fi
549
+
550
+ # Detect Detox (React Native)
551
+ if [ -f "$PROJECT_ROOT/.detoxrc.js" ] || [ -f "$PROJECT_ROOT/.detoxrc.json" ] || grep -q '"detox"' "$PROJECT_ROOT/package.json" 2>/dev/null; then
552
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "npx detox test", "type": "e2e", "framework": "detox"}]')
553
+ fi
554
+ fi
555
+
556
+ # Python projects
557
+ if [ -f "$PROJECT_ROOT/pyproject.toml" ] || [ -f "$PROJECT_ROOT/setup.py" ] || [ -f "$PROJECT_ROOT/requirements.txt" ]; then
558
+ # Pytest
559
+ if [ -f "$PROJECT_ROOT/pytest.ini" ] || [ -f "$PROJECT_ROOT/pyproject.toml" ] || [ -d "$PROJECT_ROOT/tests" ]; then
560
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "pytest", "type": "unit", "framework": "pytest"}]')
561
+ fi
562
+ fi
563
+
564
+ # Go projects
565
+ if [ -f "$PROJECT_ROOT/go.mod" ]; then
566
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "go test ./...", "type": "unit", "framework": "go"}]')
567
+ fi
568
+
569
+ # Rust projects
570
+ if [ -f "$PROJECT_ROOT/Cargo.toml" ]; then
571
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "cargo test", "type": "unit", "framework": "cargo"}]')
572
+ fi
573
+
574
+ # iOS/macOS projects (Xcode)
575
+ if find "$PROJECT_ROOT" -maxdepth 2 -name "*.xcodeproj" -o -name "*.xcworkspace" 2>/dev/null | head -1 | grep -q .; then
576
+ # Find scheme name
577
+ local xc_project=$(find "$PROJECT_ROOT" -maxdepth 2 -name "*.xcodeproj" 2>/dev/null | head -1)
578
+ local xc_workspace=$(find "$PROJECT_ROOT" -maxdepth 2 -name "*.xcworkspace" 2>/dev/null | head -1)
579
+
580
+ if [ -n "$xc_workspace" ]; then
581
+ test_commands=$(echo "$test_commands" | jq --arg ws "$(basename "$xc_workspace")" '. + [{"command": "xcodebuild test -workspace " + $ws + " -scheme <SCHEME> -destination \"platform=iOS Simulator,name=iPhone 15\"", "type": "unit", "framework": "xcode"}]')
582
+ elif [ -n "$xc_project" ]; then
583
+ test_commands=$(echo "$test_commands" | jq --arg proj "$(basename "$xc_project")" '. + [{"command": "xcodebuild test -project " + $proj + " -scheme <SCHEME> -destination \"platform=iOS Simulator,name=iPhone 15\"", "type": "unit", "framework": "xcode"}]')
584
+ fi
585
+ fi
586
+
587
+ # Swift Package Manager
588
+ if [ -f "$PROJECT_ROOT/Package.swift" ]; then
589
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "swift test", "type": "unit", "framework": "swift"}]')
590
+ fi
591
+
592
+ # Android projects (Gradle)
593
+ if [ -f "$PROJECT_ROOT/build.gradle" ] || [ -f "$PROJECT_ROOT/build.gradle.kts" ]; then
594
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "./gradlew test", "type": "unit", "framework": "gradle"}]')
595
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "./gradlew connectedAndroidTest", "type": "e2e", "framework": "gradle"}]')
596
+ fi
597
+
598
+ # Maestro (cross-platform mobile E2E)
599
+ if [ -f "$PROJECT_ROOT/maestro.yaml" ] || [ -d "$PROJECT_ROOT/.maestro" ]; then
600
+ test_commands=$(echo "$test_commands" | jq '. + [{"command": "maestro test", "type": "e2e", "framework": "maestro"}]')
601
+ fi
602
+
603
+ echo "$test_commands"
604
+ }
605
+
606
+ # Format test commands for LLM instruction
607
+ format_test_instructions() {
608
+ local test_commands=$(discover_test_commands)
609
+ local cmd_count=$(echo "$test_commands" | jq 'length')
610
+
611
+ if [ "$cmd_count" -eq 0 ]; then
612
+ echo "No test framework detected. Look for test configuration files and package.json scripts."
613
+ return
614
+ fi
615
+
616
+ local instructions="AVAILABLE TEST COMMANDS FOR THIS PROJECT:
617
+ "
618
+ # Unit tests
619
+ local unit_cmds=$(echo "$test_commands" | jq -r '[.[] | select(.type == "unit")] | .[] | " • " + .command + " (" + .framework + ")"')
620
+ if [ -n "$unit_cmds" ]; then
621
+ instructions="${instructions}
622
+ Unit/Integration Tests:
623
+ ${unit_cmds}"
624
+ fi
625
+
626
+ # E2E tests
627
+ local e2e_cmds=$(echo "$test_commands" | jq -r '[.[] | select(.type == "e2e")] | .[] | " • " + .command + " (" + .framework + ")"')
628
+ if [ -n "$e2e_cmds" ]; then
629
+ instructions="${instructions}
630
+
631
+ E2E Tests:
632
+ ${e2e_cmds}"
633
+ fi
634
+
635
+ instructions="${instructions}
636
+
637
+ PRIORITY: Run ALL tests BEFORE marking tasks complete!"
638
+
639
+ echo "$instructions"
640
+ }
641
+
256
642
  # ============================================================================
257
643
  # TEST RESULT PARSING (NEW - v2.0)
258
644
  # Parses ACTUAL test results, not just command execution
@@ -374,6 +760,48 @@ parse_test_results() {
374
760
  fi
375
761
  fi
376
762
 
763
+ # ========================================================================
764
+ # DETOX (React Native) TEST SUPPORT (NEW - v2.2)
765
+ # Format: "detox[XXXX] ✓ test name" or "X passing (Xs)"
766
+ # ========================================================================
767
+ if grep -qE '(detox\[|detox test|react-native.*test)' "$transcript" 2>/dev/null && [ "$framework" = "unknown" ]; then
768
+ framework="detox"
769
+
770
+ # Count passing tests (✓ or ✔ checkmarks)
771
+ passed=$(grep -cE '(detox\[[0-9]+\]\s*[✓✔]|✓.*test|passed)' "$transcript" 2>/dev/null || echo "0")
772
+ # Count failing tests (✗ or ✘)
773
+ failed=$(grep -cE '(detox\[[0-9]+\]\s*[✗✘]|✗.*test|failed)' "$transcript" 2>/dev/null || echo "0")
774
+
775
+ # Alternative: "X passing (Xs)" format
776
+ local detox_summary=$(grep -oE '[0-9]+\s+passing' "$transcript" 2>/dev/null | tail -1)
777
+ if [ -n "$detox_summary" ]; then
778
+ passed=$(echo "$detox_summary" | grep -oE '^[0-9]+')
779
+ fi
780
+ local detox_failed=$(grep -oE '[0-9]+\s+failing' "$transcript" 2>/dev/null | tail -1)
781
+ if [ -n "$detox_failed" ]; then
782
+ failed=$(echo "$detox_failed" | grep -oE '^[0-9]+')
783
+ fi
784
+ fi
785
+
786
+ # ========================================================================
787
+ # MAESTRO (Mobile UI Testing) SUPPORT (NEW - v2.2)
788
+ # Format: "Flow: flow.yaml - PASSED" or "Passed: X, Failed: X"
789
+ # ========================================================================
790
+ if grep -qE '(maestro test|maestro\.yaml|Flow:.*PASSED|Flow:.*FAILED)' "$transcript" 2>/dev/null && [ "$framework" = "unknown" ]; then
791
+ framework="maestro"
792
+
793
+ # Count passed/failed flows
794
+ passed=$(grep -cE 'Flow:.*PASSED' "$transcript" 2>/dev/null || echo "0")
795
+ failed=$(grep -cE 'Flow:.*FAILED' "$transcript" 2>/dev/null || echo "0")
796
+
797
+ # Alternative summary format
798
+ local maestro_summary=$(grep -oE 'Passed:\s*[0-9]+,\s*Failed:\s*[0-9]+' "$transcript" 2>/dev/null | tail -1)
799
+ if [ -n "$maestro_summary" ]; then
800
+ passed=$(echo "$maestro_summary" | grep -oE 'Passed:\s*[0-9]+' | grep -oE '[0-9]+')
801
+ failed=$(echo "$maestro_summary" | grep -oE 'Failed:\s*[0-9]+' | grep -oE '[0-9]+')
802
+ fi
803
+ fi
804
+
377
805
  # ========================================================================
378
806
  # GENERIC EXIT CODE DETECTION (NEW - v2.1)
379
807
  # Fallback for unknown frameworks - detect failure from exit codes and patterns
@@ -959,7 +1387,7 @@ if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
959
1387
  echo "$SESSION" | jq --arg now "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
960
1388
  '.status = "completed" | .endTime = $now | .endReason = "completion_promise"' \
961
1389
  > "$SESSION_FILE"
962
- approve "Completion promise detected"
1390
+ approve "Completion promise detected" "true"
963
1391
  fi
964
1392
 
965
1393
  # Check self-assessment score
@@ -1083,6 +1511,16 @@ if [ -n "$CURRENT_INCREMENT" ]; then
1083
1511
  if [ "$TOTAL_TASKS" -gt 0 ] && [ "$COMPLETED_TASKS" -ge "$TOTAL_TASKS" ]; then
1084
1512
  # All tasks marked complete - but verify tests actually passed
1085
1513
 
1514
+ # ================================================================
1515
+ # TDD STRICT MODE CHECK (NEW - v2.2)
1516
+ # In TDD mode, tests MUST pass - no exceptions
1517
+ # ================================================================
1518
+ TDD_MODE=$(is_tdd_strict_mode)
1519
+ TEST_MODE=$(get_test_mode)
1520
+
1521
+ # Log TDD mode status
1522
+ echo "{\"timestamp\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",\"event\":\"completion_check\",\"tddMode\":$TDD_MODE,\"testMode\":\"$TEST_MODE\",\"tasksComplete\":$COMPLETED_TASKS,\"totalTasks\":$TOTAL_TASKS}" >> "$LOGS_DIR/auto-iterations.log"
1523
+
1086
1524
  # Check for test files in project
1087
1525
  HAS_UNIT_TESTS=false
1088
1526
  HAS_E2E_TESTS=false
@@ -1107,7 +1545,29 @@ if [ -n "$CURRENT_INCREMENT" ]; then
1107
1545
  # Verify tests were run AND passed
1108
1546
  if [ "$HAS_UNIT_TESTS" = true ]; then
1109
1547
  if [ "$TESTS_RUN" != "true" ]; then
1110
- block "Tasks complete but TESTS NOT RUN" "🧪 MANDATORY: All tasks marked complete but NO TEST EXECUTION detected.
1548
+ # TDD mode is even stricter
1549
+ if [ "$TDD_MODE" = "true" ]; then
1550
+ block "TDD MODE: Tests MANDATORY before completion" "🔴 TDD STRICT MODE ACTIVE - TESTS REQUIRED
1551
+
1552
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1553
+ ⚠️ ALL TESTS MUST BE GREEN BEFORE AUTO MODE CAN COMPLETE
1554
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1555
+
1556
+ Tasks are marked complete but NO TESTS WERE RUN!
1557
+
1558
+ In TDD mode, this is a BLOCKING requirement:
1559
+ 1. Write/verify tests exist for all implemented features
1560
+ 2. Run ALL tests: npm test && npx vitest run
1561
+ 3. Verify 100% of tests pass (0 failures)
1562
+ 4. Only then can auto mode complete
1563
+
1564
+ Current status:
1565
+ Tasks: $COMPLETED_TASKS/$TOTAL_TASKS complete
1566
+ Tests: NOT EXECUTED ❌
1567
+
1568
+ Run your tests NOW with: npm test"
1569
+ else
1570
+ block "Tasks complete but TESTS NOT RUN" "🧪 MANDATORY: All tasks marked complete but NO TEST EXECUTION detected.
1111
1571
 
1112
1572
  You MUST run tests before completion:
1113
1573
  npm test (unit/integration)
@@ -1115,16 +1575,63 @@ You MUST run tests before completion:
1115
1575
  npx playwright test (E2E if applicable)
1116
1576
 
1117
1577
  Continue with /sw:do and run ALL tests. Verify 0 failures before proceeding."
1578
+ fi
1118
1579
  fi
1119
1580
 
1120
1581
  if [ "$TESTS_FAILED" -gt 0 ]; then
1121
- # This shouldn't happen as we handle failures above, but safety check
1122
- block "Tasks complete but TESTS FAILING" "🔴 CRITICAL: All tasks marked complete but $TESTS_FAILED tests are FAILING!
1582
+ # TDD mode - even stricter message
1583
+ if [ "$TDD_MODE" = "true" ]; then
1584
+ block "TDD MODE: $TESTS_FAILED tests FAILING - CANNOT COMPLETE" "🔴 TDD STRICT MODE - TESTS MUST BE GREEN
1585
+
1586
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1587
+ ❌ $TESTS_FAILED TEST(S) FAILING - AUTO MODE BLOCKED
1588
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1589
+
1590
+ TDD Mode requires ALL tests to pass. No exceptions.
1591
+
1592
+ Current status:
1593
+ Tests passed: $TESTS_PASSED ✅
1594
+ Tests failed: $TESTS_FAILED ❌
1595
+ Framework: $TEST_FRAMEWORK
1596
+
1597
+ YOU CANNOT SKIP THIS. Auto mode will NOT complete until:
1598
+ 1. ALL $TESTS_FAILED failing tests are fixed
1599
+ 2. Re-run tests: npm test
1600
+ 3. Verify 0 failures
1601
+
1602
+ This is by design. TDD discipline = quality."
1603
+ else
1604
+ # Standard mode - still blocks but less strict message
1605
+ block "Tasks complete but TESTS FAILING" "🔴 CRITICAL: All tasks marked complete but $TESTS_FAILED tests are FAILING!
1123
1606
 
1124
1607
  You claimed completion but tests are not passing. This is not acceptable.
1125
1608
 
1126
1609
  FIX the failing tests before marking tasks complete.
1127
1610
  Re-run: npm test or npx vitest run"
1611
+ fi
1612
+ fi
1613
+
1614
+ # TDD MODE: Additional check - require MINIMUM test count
1615
+ if [ "$TDD_MODE" = "true" ] && [ "$TESTS_PASSED" -eq 0 ]; then
1616
+ block "TDD MODE: No tests passed - suspicious" "🔴 TDD STRICT MODE - SUSPICIOUS STATE
1617
+
1618
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1619
+ ⚠️ 0 TESTS PASSED - SOMETHING IS WRONG
1620
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1621
+
1622
+ You have $COMPLETED_TASKS tasks marked complete but 0 tests passed.
1623
+
1624
+ This suggests:
1625
+ 1. Tests exist but weren't actually run
1626
+ 2. Test detection isn't working
1627
+ 3. Test output wasn't captured
1628
+
1629
+ In TDD mode, you MUST have passing tests for completed work.
1630
+
1631
+ Actions:
1632
+ 1. Run tests explicitly: npm test
1633
+ 2. Verify test output shows passed count
1634
+ 3. Check test files exist for implemented features"
1128
1635
  fi
1129
1636
  fi
1130
1637
 
@@ -1280,7 +1787,7 @@ Continue with /sw:do and add missing E2E tests."
1280
1787
  --argjson passed "${TESTS_PASSED:-0}" --argjson failed "${TESTS_FAILED:-0}" \
1281
1788
  '.status = "completed" | .endTime = $now | .endReason = "all_tasks_complete" | .finalTestResults = {"passed": $passed, "failed": $failed}' \
1282
1789
  > "$SESSION_FILE"
1283
- approve "All tasks completed, all tests passed ($TESTS_PASSED passed, 0 failed)"
1790
+ approve "All tasks completed, all tests passed ($TESTS_PASSED passed, 0 failed)" "true"
1284
1791
  else
1285
1792
  # More increments in queue - transition to next
1286
1793
  NEXT_INCREMENT=$(echo "$SESSION" | jq -r '.incrementQueue[1] // null')
@@ -1402,10 +1909,13 @@ echo "$SESSION" | jq --argjson iter "$NEXT_ITERATION" --arg now "$(date -u +%Y-%
1402
1909
  '.iteration = $iter | .lastActivity = $now' \
1403
1910
  > "$SESSION_FILE"
1404
1911
 
1405
- # Build context message
1912
+ # Build context message with test instructions (v2.2)
1406
1913
  if [ "$SIMPLE_MODE" = "true" ]; then
1407
1914
  CONTEXT="Continue working. Iteration $NEXT_ITERATION/$MAX_ITERATIONS."
1408
1915
  else
1916
+ # Get TDD mode status
1917
+ TDD_MODE=$(is_tdd_strict_mode)
1918
+
1409
1919
  PROGRESS=""
1410
1920
  if [ -n "$CURRENT_INCREMENT" ] && [ -f "$TASKS_FILE" ]; then
1411
1921
  PROGRESS="Tasks: $COMPLETED_TASKS/$TOTAL_TASKS completed."
@@ -1418,9 +1928,57 @@ else
1418
1928
  else
1419
1929
  TEST_STATUS="Tests: ⚠️ $TESTS_PASSED passed, $TESTS_FAILED FAILED."
1420
1930
  fi
1931
+ else
1932
+ # Tests not run - add warning
1933
+ TEST_STATUS="Tests: ⚠️ NOT RUN YET!"
1934
+ fi
1935
+
1936
+ # Get test instructions for this project (NEW - v2.2)
1937
+ TEST_INSTRUCTIONS=$(format_test_instructions)
1938
+
1939
+ # Build stop criteria message - MUST be clear for each agent
1940
+ STOP_CRITERIA=""
1941
+ if [ "$TDD_MODE" = "true" ]; then
1942
+ STOP_CRITERIA="
1943
+ ╔══════════════════════════════════════════════════════════════╗
1944
+ ║ 🎯 AGENT STOP CONDITION (TDD STRICT MODE) ║
1945
+ ╠══════════════════════════════════════════════════════════════╣
1946
+ ║ ✅ ALL tasks in tasks.md marked [x] completed ║
1947
+ ║ ✅ ALL tests pass (0 failures required) ║
1948
+ ║ ✅ Test execution detected in output ║
1949
+ ╠══════════════════════════════════════════════════════════════╣
1950
+ ║ Source: $(printf '%-45s' "$(get_tdd_source)")║
1951
+ ╚══════════════════════════════════════════════════════════════╝"
1952
+ else
1953
+ STOP_CRITERIA="
1954
+ ╔══════════════════════════════════════════════════════════════╗
1955
+ ║ 🎯 AGENT STOP CONDITION ║
1956
+ ╠══════════════════════════════════════════════════════════════╣
1957
+ ║ ✅ ALL tasks in tasks.md marked [x] completed ║
1958
+ ║ ✅ Tests executed and passing ║
1959
+ ╚══════════════════════════════════════════════════════════════╝"
1421
1960
  fi
1422
1961
 
1423
- CONTEXT="AUTO ACTIVE: Iteration $NEXT_ITERATION/$MAX_ITERATIONS. $PROGRESS $TEST_STATUS Continue with /sw:do to complete remaining tasks."
1962
+ # Build full context message
1963
+ CONTEXT="🤖 AUTO MODE ACTIVE - Iteration $NEXT_ITERATION/$MAX_ITERATIONS
1964
+ $STOP_CRITERIA
1965
+
1966
+ 📊 CURRENT PROGRESS:
1967
+ $PROGRESS
1968
+ $TEST_STATUS
1969
+
1970
+ $TEST_INSTRUCTIONS
1971
+
1972
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1973
+ 📋 REQUIRED ACTIONS (do these in order):
1974
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1975
+ 1. Review remaining tasks in tasks.md
1976
+ 2. Implement next incomplete task
1977
+ 3. RUN TESTS via CLI (mandatory - see commands above)
1978
+ 4. Verify tests pass before marking task complete
1979
+ 5. Continue with /sw:do
1980
+
1981
+ ⚠️ This agent will NOT stop until the STOP CONDITION above is met!"
1424
1982
  fi
1425
1983
 
1426
1984
  # Log iteration
@@ -15,13 +15,18 @@
15
15
  # --no-inc Alias for --no-increment (short form)
16
16
  # --prompt "text" Analyze prompt and create increments (intelligent chunking)
17
17
  # --yes Auto-approve increment plan (skip user approval)
18
+ # --tdd Enable TDD strict mode (ALL tests must pass)
19
+ # --strict Alias for --tdd
18
20
  # -h, --help Show this help
19
21
 
20
22
  set -e
21
23
 
22
- # Defaults
23
- MAX_ITERATIONS=100
24
- MAX_HOURS=""
24
+ # Defaults - v2.3 enhanced for ultra-long sessions
25
+ # IMPORTANT: Iteration limits are safety nets, NOT primary completion criteria
26
+ # Primary completion = tests passing + tasks complete (Ralph Wiggum pattern)
27
+ # NOTE: Stop hook runs PER AGENT - each spawned subagent gets its own hook invocation
28
+ MAX_ITERATIONS=2500 # 5x increase from 500 for ultra-long sessions
29
+ MAX_HOURS="600" # 25 days default (5x increase from 120 hours)
25
30
  SIMPLE_MODE=false
26
31
  DRY_RUN=false
27
32
  INCREMENT_IDS=()
@@ -30,6 +35,7 @@ SKIP_GATES=""
30
35
  NO_INCREMENT=false
31
36
  PROMPT=""
32
37
  AUTO_APPROVE=false
38
+ TDD_MODE=false
33
39
 
34
40
  # Parse arguments
35
41
  while [[ $# -gt 0 ]]; do
@@ -74,6 +80,10 @@ while [[ $# -gt 0 ]]; do
74
80
  AUTO_APPROVE=true
75
81
  shift
76
82
  ;;
83
+ --tdd|--strict)
84
+ TDD_MODE=true
85
+ shift
86
+ ;;
77
87
  -h|--help)
78
88
  grep '^#' "$0" | grep -v '!/bin/bash' | sed 's/^# //'
79
89
  exit 0
@@ -289,7 +299,8 @@ SESSION_JSON=$(cat <<EOF
289
299
  "ado": { "state": "closed", "failures": 0 }
290
300
  },
291
301
  "lastActivity": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
292
- "simple": $SIMPLE_MODE
302
+ "simple": $SIMPLE_MODE,
303
+ "tddMode": $TDD_MODE
293
304
  }
294
305
  EOF
295
306
  )
@@ -317,6 +328,12 @@ echo "Session ID: $SESSION_ID"
317
328
  echo "Max Iterations: $MAX_ITERATIONS"
318
329
  [ -n "$MAX_HOURS" ] && echo "Max Hours: $MAX_HOURS"
319
330
  echo "Simple Mode: $SIMPLE_MODE"
331
+ if [ "$TDD_MODE" = "true" ]; then
332
+ echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
333
+ echo "🔴 TDD STRICT MODE: ENABLED"
334
+ echo " ALL tests MUST pass before auto mode can complete"
335
+ echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
336
+ fi
320
337
  echo ""
321
338
  echo "Increment Queue (${#INCREMENT_IDS[@]}):"
322
339
  for INC_ID in "${INCREMENT_IDS[@]}"; do
@@ -326,7 +343,8 @@ echo ""
326
343
  echo "Current: ${INCREMENT_IDS[0]}"
327
344
  echo ""
328
345
  echo "The session will continue until:"
329
- echo " • All tasks complete"
346
+ echo " • All tasks complete AND tests pass"
347
+ [ "$TDD_MODE" = "true" ] && echo " • (TDD MODE: 100% tests GREEN required)"
330
348
  echo " • Max iterations ($MAX_ITERATIONS) reached"
331
349
  [ -n "$MAX_HOURS" ] && echo " • Max hours ($MAX_HOURS) exceeded"
332
350
  echo " • You run /sw:cancel-auto"
@@ -85,7 +85,7 @@ Higher RICE = Higher Priority
85
85
 
86
86
  **Example**:
87
87
  ```markdown
88
- ## Feature Prioritization (Q1 2025 MVP)
88
+ ## Feature Prioritization (Q1 2026 MVP)
89
89
 
90
90
  ### Must Have (P1)
91
91
  | Feature | Reason |
@@ -107,9 +107,9 @@ Higher RICE = Higher Priority
107
107
  | Custom Themes | Requested by enterprise customers, can wait for v2 |
108
108
 
109
109
  ### Won't Have (This Release)
110
- - Mobile apps (Q2 2025 roadmap)
111
- - Advanced analytics dashboard (Q3 2025)
112
- - API for third-party integrations (Q4 2025)
110
+ - Mobile apps (Q2 2026 roadmap)
111
+ - Advanced analytics dashboard (Q3 2026)
112
+ - API for third-party integrations (Q4 2026)
113
113
  - Offline mode (technical complexity too high for MVP)
114
114
  ```
115
115
 
@@ -183,9 +183,9 @@ Higher RICE = Higher Priority
183
183
 
184
184
  **Example**:
185
185
  ```markdown
186
- # Product Roadmap 2025
186
+ # Product Roadmap 2026
187
187
 
188
- ## Q1 2025: Foundation (MVP)
188
+ ## Q1 2026: Foundation (MVP)
189
189
  **Theme**: Core Task Management
190
190
  **Goal**: Launch with 100 beta users
191
191
  **Team Focus**: Backend + Frontend (1:1 split)
@@ -230,7 +230,7 @@ Higher RICE = Higher Priority
230
230
 
231
231
  ---
232
232
 
233
- ## Q2 2025: Collaboration
233
+ ## Q2 2026: Collaboration
234
234
  **Theme**: Team Features
235
235
  **Goal**: 1K paying customers, $50K MRR
236
236
  **Team Focus**: Backend + Frontend + Mobile (2:2:1 split)
@@ -250,7 +250,7 @@ Higher RICE = Higher Priority
250
250
 
251
251
  ---
252
252
 
253
- ## Q3 2025: Integrations
253
+ ## Q3 2026: Integrations
254
254
  **Theme**: Workflow Automation
255
255
  **Goal**: 5K customers, $200K MRR
256
256
 
@@ -269,7 +269,7 @@ Higher RICE = Higher Priority
269
269
 
270
270
  ---
271
271
 
272
- ## Q4 2025: Enterprise
272
+ ## Q4 2026: Enterprise
273
273
  **Theme**: Scale & Compliance
274
274
  **Goal**: 10K customers, $500K MRR
275
275
 
@@ -303,28 +303,28 @@ key_results:
303
303
  target: "70% of registered users"
304
304
  measurement: "Track unique logins per day (Mixpanel)"
305
305
  current: "52%"
306
- target_date: "2025-Q2"
306
+ target_date: "2026-Q2"
307
307
 
308
308
  KR2:
309
309
  metric: "Feature Adoption - Real-time Collaboration"
310
310
  target: "50% of teams use real-time editing within first week"
311
311
  measurement: "Track WebSocket connections per team"
312
312
  current: "0% (feature not launched)"
313
- target_date: "2025-Q1"
313
+ target_date: "2026-Q1"
314
314
 
315
315
  KR3:
316
316
  metric: "Customer Satisfaction (NPS)"
317
317
  target: "NPS > 40"
318
318
  measurement: "In-app survey after 1 week of use"
319
319
  current: "28"
320
- target_date: "2025-Q3"
320
+ target_date: "2026-Q3"
321
321
 
322
322
  KR4:
323
323
  metric: "Revenue Growth"
324
324
  target: "$200K MRR by end of Q3"
325
325
  measurement: "Stripe dashboard (MRR)"
326
326
  current: "$15K MRR"
327
- target_date: "2025-Q3"
327
+ target_date: "2026-Q3"
328
328
  ```
329
329
 
330
330
  ### Metric Categories
@@ -435,7 +435,7 @@ We're proposing a shift from our current monolithic architecture to microservice
435
435
  - **Payback Period**: 18 months for 3x ROI
436
436
 
437
437
  **Recommendation**: Approve for Q3 implementation
438
- **Timeline**: 8 weeks (Q3 2025)
438
+ **Timeline**: 8 weeks (Q3 2026)
439
439
  **Team**: 3 backend engineers, 1 DevOps engineer
440
440
  **Risk Level**: Medium (well-established pattern, many success stories)
441
441
  ```