npm - dev-playbooks - Versions diffs - 1.3.0 → 1.5.0 - Mend

dev-playbooks 1.3.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/skills/devbooks-coder/SKILL.md +91 -31
package/skills/devbooks-convergence-audit/SKILL.md +394 -0
package/skills/devbooks-test-owner/SKILL.md +97 -35

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "dev-playbooks",
-  "version": "1.3.0",
+  "version": "1.5.0",
   "description": "AI-powered spec-driven development workflow",
   "keywords": [
     "devbooks",

package/skills/devbooks-coder/SKILL.md CHANGED Viewed

@@ -14,16 +14,27 @@ allowed-tools:
 ## Workflow Position Awareness
-> **Core Principle**: Coder executes after Test Owner Phase 1 and hands off to Test Owner Phase 2 for verification upon completion.
+> **Core Principle**: Coder executes after Test Owner Phase 1, achieving mental clarity through **mode labels** (not session isolation).
 ### My Position in the Overall Workflow
 ```
-proposal → design → test-owner(phase1) → [Coder] → test-owner(phase2) → code-review → archive
-                                            ↓
-                                    Implement code, make tests green
+proposal → design → [TEST-OWNER] → [CODER] → [TEST-OWNER] → code-review → archive
+                                      ↓              ↓
+                               Implement+fast track    Evidence audit
+                              (@smoke/@critical)      (no @full rerun)
 ```
+### AI Era Solo Development Optimization
+> **Important Change**: This protocol is optimized for AI programming + solo development scenarios, **removing the mandatory "separate session" requirement**.
+| Old Design | New Design | Reason |
+|------------|------------|--------|
+| Test Owner and Coder must use separate sessions | Same session, switch with `[TEST-OWNER]` / `[CODER]` mode labels | Reduce context rebuilding cost |
+| Coder runs full tests and waits | Coder runs fast track (`@smoke`/`@critical`), `@full` triggered async | Fast iteration |
+| Completion goes directly to Test Owner | Completion status is `Implementation Done`, wait for @full | Async doesn't block, archive is sync |
 ### Coder's Responsibility Boundaries
 | Allowed | Prohibited |
@@ -31,18 +42,60 @@ proposal → design → test-owner(phase1) → [Coder] → test-owner(phase2)
 | Modify `src/**` code | ❌ Modify `tests/**` |
 | Check off `tasks.md` items | ❌ Modify `verification.md` |
 | Record deviations to `deviation-log.md` | ❌ Check off AC coverage matrix |
-| Run tests to verify | ❌ Set verification.md Status |
+| Run fast track tests (`@smoke`/`@critical`) | ❌ Set verification.md Status to Verified/Done |
+| Trigger `@full` tests (CI/background) | ❌ Wait for @full completion (can start next change) |
 ### Flow After Coder Completes
-1. **Tasks complete**: tasks.md all `[x]`
-2. **Tests green**: Run `npm test` to confirm passing
-3. **Hand off to Test Owner**: Notify Test Owner to enter Phase 2 verification
-4. **Wait for verification result**:
-   - Test Owner confirms green → Proceed to Code Review
-   - Test Owner finds issues → Coder fixes
+1. **Fast track tests green**: `@smoke` + `@critical` pass
+2. **Trigger @full**: Commit code, CI starts running @full tests async
+3. **Status change**: Set change status to `Implementation Done`
+4. **Can start next change** (not blocked)
+5. **Wait for @full results**:
+   - @full passes → Test Owner enters Phase 2 to audit evidence
+   - @full fails → Coder fixes
+**Key Reminders**:
+- After Coder completes, status is `Implementation Done`, **not directly to Code Review**
+- Dev iteration is async (can start next change), but archive is sync (must wait for @full to pass)
+---
+## Test Layering and Run Strategy (Critical!)
+> **Core Principle**: Coder only runs fast track tests, @full tests are triggered async, not blocking dev iteration.
+### Test Layering Labels
+| Label | Purpose | When Coder Runs | Expected Time |
+|-------|---------|-----------------|---------------|
+| `@smoke` | Fast feedback, core paths | After each code change | Seconds |
+| `@critical` | Key functionality verification | Before commit | Minutes |
+| `@full` | Complete acceptance tests | **Don't run**, trigger CI async | Can be slow |
-**Key Reminder**: After Coder completes, **do not go directly to Code Review**, first let Test Owner verify and check off.
+### Coder's Test Run Strategy
+```bash
+# During development: frequently run @smoke
+npm test -- --grep "@smoke"
+# Before commit: run @critical
+npm test -- --grep "@smoke|@critical"
+# After commit: CI automatically runs @full (Coder doesn't wait)
+git push  # triggers CI
+# → Coder can start next task
+```
+### Async vs Sync Boundary
+| Action | Blocking/Async | Description |
+|--------|----------------|-------------|
+| `@smoke` tests | Sync | Run immediately after each change |
+| `@critical` tests | Sync | Must pass before commit |
+| `@full` tests | **Async** | CI runs in background, doesn't block Coder |
+| Start next change | **Not blocked** | Coder can start immediately |
+| Archive | **Blocked** | Must wait for @full to pass |
 ---
@@ -319,26 +372,29 @@ During implementation, you **must immediately** write to `deviation-log.md` in t
 | Code | Status | Determination Criteria | Next Step |
 |:----:|--------|------------------------|-----------|
-| ✅ | COMPLETED | All tasks done, no deviations | `devbooks-code-review` |
-| ⚠️ | COMPLETED_WITH_DEVIATION | Tasks done, deviation-log has pending records | `devbooks-design-backport` |
-| 🔄 | HANDOFF | Found test issues needing modification | `devbooks-test-owner` |
+| ✅ | IMPLEMENTATION_DONE | Fast track tests green, @full triggered, no deviations | Switch to `[TEST-OWNER]` wait for @full |
+| ⚠️ | IMPLEMENTATION_DONE_WITH_DEVIATION | Fast track green, deviation-log has pending records | `devbooks-design-backport` |
+| 🔄 | HANDOFF | Found test issues needing modification | Switch to `[TEST-OWNER]` mode to fix tests |
 | ❌ | BLOCKED | Needs external input/decision | Record breakpoint, wait for user |
-| 💥 | FAILED | Gates not passed | Fix and retry |
+| 💥 | FAILED | Fast track tests not passing | Fix and retry |
 ### Status Determination Flow
 ```
 1. Check if deviation-log.md has "| ❌" records
-   → Yes: COMPLETED_WITH_DEVIATION
+   → Yes: IMPLEMENTATION_DONE_WITH_DEVIATION
 2. Check if tests/ modification needed
-   → Yes: HANDOFF to test-owner
+   → Yes: HANDOFF to [TEST-OWNER] mode
+3. Check if fast track tests (@smoke + @critical) all pass
+   → No: FAILED
-3. Check if tasks.md is fully completed
-   → No: BLOCKED or FAILED
+4. Check if tasks.md is fully completed
+   → No: BLOCKED or continue implementation
-4. All checks passed
-   → COMPLETED
+5. All checks passed, trigger @full
+   → IMPLEMENTATION_DONE
 ```
 ### Routing Output Template (Required)
@@ -348,37 +404,41 @@ After completing coder, you **must** output in this format:
 ```markdown
 ## Completion Status
-**Status**: ✅ COMPLETED / ⚠️ COMPLETED_WITH_DEVIATION / 🔄 HANDOFF / ❌ BLOCKED / 💥 FAILED
+**Status**: ✅ IMPLEMENTATION_DONE / ⚠️ ... / 🔄 HANDOFF / ❌ BLOCKED / 💥 FAILED
 **Task Progress**: X/Y completed
+**Fast Track Tests**: @smoke ✅ / @critical ✅
+**@full Tests**: Triggered (CI running async)
 **Deviation Records**: Has N pending / None
 ## Next Step
-**Recommended**: `devbooks-xxx skill`
+**Recommended**: Switch to `[TEST-OWNER]` mode wait for @full / `devbooks-xxx skill`
 **Reason**: [specific reason]
-### How to invoke
-Run devbooks-xxx skill for change <change-id>
+**Note**: Can start next change, no need to wait for @full completion
 ```
 ### Specific Routing Rules
 | My Status | Next Step | Reason |
 |-----------|-----------|--------|
-| COMPLETED | `devbooks-test-owner` (Phase 2 verification) | Tasks complete, need Test Owner to verify and check off |
-| COMPLETED_WITH_DEVIATION | `devbooks-design-backport` | Backport design first, then let Test Owner verify |
-| HANDOFF (test issue) | `devbooks-test-owner` | Coder cannot modify tests |
+| IMPLEMENTATION_DONE | Switch to `[TEST-OWNER]` mode (wait for @full) | Fast track green, wait for @full to pass then audit evidence |
+| IMPLEMENTATION_DONE_WITH_DEVIATION | `devbooks-design-backport` | Backport design first |
+| HANDOFF (test issue) | Switch to `[TEST-OWNER]` mode | Coder cannot modify tests |
 | BLOCKED | Wait for user | Record breakpoint area |
 | FAILED | Fix and retry | Analyze failure reason |
 **Critical Constraints**:
 - Coder **can never modify** `tests/**`
-- If test issues found, must HANDOFF to Test Owner (separate session)
+- If test issues found, must switch to `[TEST-OWNER]` mode to handle
 - If deviations exist, must design-backport first before continuing
-- **Coder must go through Test Owner Phase 2 verification before Code Review**
+- **Coder completion status is `Implementation Done`, must wait for @full to pass before entering Test Owner Phase 2**
+- **Mode switching replaces session isolation**: Use `[TEST-OWNER]` / `[CODER]` labels to switch modes
 ---

package/skills/devbooks-convergence-audit/SKILL.md ADDED Viewed

@@ -0,0 +1,394 @@
+---
+name: devbooks-convergence-audit
+description: devbooks-convergence-audit: Evaluates DevBooks workflow convergence using "evidence first, distrust declarations" principle, detects "Sisyphus anti-patterns" and "fake completion". Actively verifies rather than trusting document claims. Use when user says "evaluate convergence/check upgrade health/Sisyphus detection/workflow audit" etc.
+allowed-tools:
+  - Glob
+  - Grep
+  - Read
+  - Bash
+---
+# DevBooks: Convergence Audit
+## Core Principle: Anti-Deception Design
+> **Golden Rule**: **Evidence > Declarations**. Never trust any assertions in documents; must confirm through verifiable evidence.
+### Scenarios Where AI Gets Deceived (Must Prevent)
+| Deception Scenario | AI Wrong Behavior | Correct Behavior |
+|--------------------|-------------------|------------------|
+| Document says `Status: Done` | Believes it's complete | Verify: Are tests actually all green? Does evidence exist? |
+| AC matrix all `[x]` | Believes full coverage | Verify: Does test file for each AC exist and pass? |
+| Document says "tests passed" | Believes passed | Verify: Actually run tests or check CI log timestamps |
+| `evidence/` directory exists | Believes evidence exists | Verify: Is directory non-empty? Is content valid test logs? |
+| tasks.md all `[x]` | Believes implemented | Verify: Do corresponding code files exist with substance? |
+| Commit message says "fixed" | Believes fixed | Verify: Did related tests change from red to green? |
+### Three Anti-Deception Principles
+```
+1. Distrust Declarations
+   - Any "complete/passed/covered" claims in documents are hypotheses to verify
+   - Default stance: Claims may be wrong, outdated, or optimistic
+2. Evidence First
+   - Code/test results are the only truth
+   - Log timestamps must be later than last code modification
+   - Empty directory/file = no evidence
+3. Cross Validation
+   - Declaration vs evidence: Check consistency
+   - Code vs tests: Check if they match
+   - Multiple documents: Check for contradictions
+```
+---
+## Verification Checklist (Execute Each)
+### Check 1: Status Field Truthfulness Verification
+**Document Claim**: `verification.md` contains `Status: Done` or `Status: Verified`
+**Verification Steps**:
+```bash
+# 1. Check if verification.md exists
+[[ -f "verification.md" ]] || echo "❌ verification.md does not exist"
+# 2. Check if evidence/green-final/ has content
+if [[ -z "$(ls -A evidence/green-final/ 2>/dev/null)" ]]; then
+  echo "❌ Status claims complete, but evidence/green-final/ is empty"
+fi
+# 3. Check if evidence timestamp is later than last code modification
+code_mtime=$(stat -f %m src/ 2>/dev/null || stat -c %Y src/)
+evidence_mtime=$(stat -f %m evidence/green-final/* 2>/dev/null | sort -n | tail -1)
+if [[ $evidence_mtime -lt $code_mtime ]]; then
+  echo "❌ Evidence time is earlier than code modification, evidence may be stale"
+fi
+```
+**Deception Detection**:
+- ⚠️ Status=Done but evidence/ empty → **Fake Completion**
+- ⚠️ Status=Done but evidence timestamp too old → **Stale Evidence**
+- ⚠️ Status=Done but tests actually fail → **False Status**
+---
+### Check 2: AC Coverage Matrix Truthfulness Verification
+**Document Claim**: `[x]` in AC matrix means covered
+**Verification Steps**:
+```bash
+# 1. Extract all ACs claimed as covered
+grep -E '^\| AC-[0-9]+.*\[x\]' verification.md | while read line; do
+  ac_id=$(echo "$line" | grep -oE 'AC-[0-9]+')
+  test_id=$(echo "$line" | grep -oE 'T-[0-9]+')
+  # 2. Verify corresponding test exists
+  if ! grep -rq "$test_id\|$ac_id" tests/; then
+    echo "❌ $ac_id claims covered, but no corresponding test found"
+  fi
+done
+# 3. Actually run tests to verify (most reliable)
+npm test 2>&1 | tee /tmp/test-output.log
+if grep -q "FAIL\|Error\|failed" /tmp/test-output.log; then
+  echo "❌ ACs claim full coverage, but tests actually have failures"
+fi
+```
+**Deception Detection**:
+- ⚠️ AC checked but corresponding test file doesn't exist → **False Coverage**
+- ⚠️ AC checked but test actually fails → **Fake Green**
+- ⚠️ AC checked but test content is empty/placeholder → **Placeholder Test**
+---
+### Check 3: tasks.md Completion Truthfulness Verification
+**Document Claim**: `[x]` in tasks.md means completed
+**Verification Steps**:
+```bash
+# 1. Extract all tasks claimed as complete
+grep -E '^\- \[x\]' tasks.md | while read line; do
+  # 2. Extract keywords from task description (function name/file name/feature)
+  keywords=$(echo "$line" | grep -oE '[A-Za-z]+[A-Za-z0-9]*' | head -5)
+  # 3. Verify code has corresponding implementation
+  for kw in $keywords; do
+    if ! grep -rq "$kw" src/; then
+      echo "⚠️ Task claims complete, but keyword not found in code: $kw"
+    fi
+  done
+done
+# 4. Check for "skeleton code" (only function signatures without implementation)
+grep -rE 'throw new Error\(.*not implemented|TODO|FIXME|pass$|\.\.\.}' src/ && \
+  echo "⚠️ Found unimplemented placeholder code"
+```
+**Deception Detection**:
+- ⚠️ Task checked but code doesn't exist → **False Completion**
+- ⚠️ Task checked but code is placeholder → **Skeleton Code**
+- ⚠️ Task checked but feature not callable → **Dead Code**
+---
+### Check 4: Evidence Validity Verification
+**Document Claim**: `evidence/` directory contains test evidence
+**Verification Steps**:
+```bash
+# 1. Check if directory exists and is non-empty
+if [[ ! -d "evidence" ]] || [[ -z "$(ls -A evidence/)" ]]; then
+  echo "❌ evidence/ does not exist or is empty"
+  exit 1
+fi
+# 2. Check if evidence files have substantial content
+for f in evidence/**/*; do
+  if [[ -f "$f" ]]; then
+    lines=$(wc -l < "$f")
+    if [[ $lines -lt 5 ]]; then
+      echo "⚠️ Evidence file has too little content: $f ($lines lines)"
+    fi
+    # 3. Check if it's a valid test log (contains test framework output characteristics)
+    if ! grep -qE 'PASS|FAIL|✓|✗|passed|failed|test|spec' "$f"; then
+      echo "⚠️ Evidence file doesn't look like test log: $f"
+    fi
+  fi
+done
+# 4. Check if red-baseline evidence really is red (has failures)
+if [[ -d "evidence/red-baseline" ]]; then
+  if ! grep -rqE 'FAIL|Error|✗|failed' evidence/red-baseline/; then
+    echo "❌ red-baseline claims to be red, but no failure records"
+  fi
+fi
+# 5. Check if green-final evidence really is green (all pass)
+if [[ -d "evidence/green-final" ]]; then
+  if grep -rqE 'FAIL|Error|✗|failed' evidence/green-final/; then
+    echo "❌ green-final claims to be green, but contains failure records"
+  fi
+fi
+```
+**Deception Detection**:
+- ⚠️ evidence/ exists but content is empty → **Empty Evidence**
+- ⚠️ Evidence file too small (< 5 lines) → **Placeholder Evidence**
+- ⚠️ red-baseline has no failure records → **Fake Red**
+- ⚠️ green-final contains failure records → **Fake Green**
+---
+### Check 5: Git History Cross Validation
+**Principle**: Git history doesn't lie; use it to verify document claims
+**Verification Steps**:
+```bash
+# 1. Check if claimed-complete change has corresponding code commits
+change_id="xxx"
+commits=$(git log --oneline --all --grep="$change_id" | wc -l)
+if [[ $commits -eq 0 ]]; then
+  echo "❌ Change $change_id claims complete, but no related commits in git history"
+fi
+# 2. Check if test files were added after code (TDD violation detection)
+for test_file in tests/**/*.test.*; do
+  test_added=$(git log --format=%at --follow -- "$test_file" | tail -1)
+  # Find corresponding source file
+  src_file=$(echo "$test_file" | sed 's/tests/src/' | sed 's/.test//')
+  if [[ -f "$src_file" ]]; then
+    src_added=$(git log --format=%at --follow -- "$src_file" | tail -1)
+    if [[ $test_added -gt $src_added ]]; then
+      echo "⚠️ Test added after code (non-TDD): $test_file"
+    fi
+  fi
+done
+# 3. Check for "one-time big commits" (may be bypassing process)
+git log --oneline -20 | while read line; do
+  commit=$(echo "$line" | cut -d' ' -f1)
+  files_changed=$(git show --stat "$commit" | grep -E '[0-9]+ file' | grep -oE '[0-9]+' | head -1)
+  if [[ $files_changed -gt 20 ]]; then
+    echo "⚠️ Big commit detected: $commit modified $files_changed files, may bypass incremental verification"
+  fi
+done
+```
+**Deception Detection**:
+- ⚠️ Claims complete but no git commits → **Fake Change**
+- ⚠️ Tests added after code → **Retroactive Testing**
+- ⚠️ Many files in one commit → **Bypassing Incremental Verification**
+---
+### Check 6: Live Test Run Verification (Most Reliable)
+**Principle**: Don't trust any logs; actually run tests
+**Verification Steps**:
+```bash
+# 1. Run full tests
+echo "=== Live Test Verification ==="
+npm test 2>&1 | tee /tmp/live-test.log
+# 2. Check results
+if grep -qE 'FAIL|Error|failed' /tmp/live-test.log; then
+  echo "❌ Live tests failed, document claims not trustworthy"
+  grep -E 'FAIL|Error|failed' /tmp/live-test.log
+else
+  echo "✅ Live tests passed"
+fi
+# 3. Compare live results with evidence files
+if [[ -f "evidence/green-final/latest.log" ]]; then
+  live_pass=$(grep -c 'PASS\|✓\|passed' /tmp/live-test.log)
+  evidence_pass=$(grep -c 'PASS\|✓\|passed' evidence/green-final/latest.log)
+  if [[ $live_pass -ne $evidence_pass ]]; then
+    echo "⚠️ Live pass count ($live_pass) ≠ evidence pass count ($evidence_pass)"
+  fi
+fi
+```
+**Deception Detection**:
+- ⚠️ Evidence says green but live run fails → **Stale Evidence/Fake Green**
+- ⚠️ Live pass count differs from evidence → **Evidence Fabrication/Environment Difference**
+---
+## Composite Scoring Algorithm
+### Trustworthiness Score (0-100)
+```python
+def calculate_trustworthiness(checks):
+    score = 100
+    # Critical issues (each -20 points)
+    critical = [
+        "Evidence empty",
+        "Live tests failed",
+        "Status claims complete but tests fail",
+        "green-final contains failure records"
+    ]
+    # Warning issues (each -10 points)
+    warnings = [
+        "Evidence timestamp too old",
+        "AC corresponding test doesn't exist",
+        "Placeholder code",
+        "Big commit detected"
+    ]
+    # Minor issues (each -5 points)
+    minor = [
+        "Tests added after code",
+        "Evidence file too small"
+    ]
+    for issue in checks.critical_issues:
+        score -= 20
+    for issue in checks.warnings:
+        score -= 10
+    for issue in checks.minor_issues:
+        score -= 5
+    return max(0, score)
+```
+### Convergence Determination
+| Trustworthiness | Determination | Recommendation |
+|-----------------|---------------|----------------|
+| 90-100 | ✅ Trustworthy Convergence | Continue current process |
+| 70-89 | ⚠️ Partially Trustworthy | Need supplementary verification |
+| 50-69 | 🟠 Questionable | Need to rework some steps |
+| < 50 | 🔴 Untrustworthy | Sisyphus trap, need comprehensive review |
+---
+## Output Format
+```markdown
+# DevBooks Convergence Audit Report (Anti-Deception Edition)
+## Audit Principle
+This report uses "evidence first, distrust declarations" principle. All conclusions are based on verifiable evidence, not document claims.
+## Declaration vs Evidence Comparison
+| Check Item | Document Claim | Actual Verification | Conclusion |
+|------------|----------------|---------------------|------------|
+| Status | Done | Tests actually fail | ❌ Fake Completion |
+| AC Coverage | 5/5 checked | 2 ACs have no corresponding tests | ❌ False Coverage |
+| Test Status | All green | Live run 3 failures | ❌ Stale Evidence |
+| tasks.md | 10/10 complete | 3 tasks have no code | ❌ False Completion |
+| evidence/ | Exists | Non-empty, content valid | ✅ Valid |
+## Trustworthiness Score
+**Total Score**: 45/100 🔴 Untrustworthy
+**Deduction Details**:
+- -20: Status=Done but live tests fail
+- -20: ACs claim full coverage but 2 have no tests
+- -10: tasks.md 3 tasks have no code
+- -5: Evidence timestamp earlier than code modification
+## Deception Detection Results
+### 🔴 Detected Fake Completions
+1. `change-auth`: Status=Done, but `npm test` fails 3
+2. `fix-cache`: AC-003 checked, but `tests/cache.test.ts` doesn't exist
+### 🟡 Suspicious Items
+1. `refactor-api`: evidence/green-final/ timestamp 2 days earlier than last code commit
+2. `feature-login`: tasks.md all checked, but `src/login.ts` contains TODO
+## True Status Determination
+| Change Package | Claimed Status | True Status | Gap |
+|----------------|----------------|-------------|-----|
+| change-auth | Done | Tests failing | 🔴 Severe |
+| fix-cache | Verified | Coverage incomplete | 🟠 Medium |
+| refactor-api | Ready | Evidence stale | 🟡 Minor |
+## Recommended Actions
+### Immediate Action
+1. Revert `change-auth` status to `In Progress`
+2. Add tests for `fix-cache` AC-003
+### Short-term Improvement
+1. Establish evidence timeliness check (evidence must be later than code)
+2. Force run corresponding tests before AC check-off
+### Process Improvement
+1. Prohibit manual Status modification; only allow auto-update after script verification
+2. Integrate convergence check in CI, block fake completion from merging
+```
+---
+## Completion Status
+**Status**: ✅ AUDIT_COMPLETED
+**Core Findings**:
+- Document claim trustworthiness: X%
+- Detected fake completions: N
+- Changes needing rework: M
+**Next Step**:
+- Fake completion → Immediately revert status, re-verify
+- Suspicious items → Supplement evidence or re-run tests
+- Trustworthy items → Continue current process

package/skills/devbooks-test-owner/SKILL.md CHANGED Viewed

@@ -14,44 +14,98 @@ allowed-tools:
 ## Workflow Position Awareness
-> **Core Principle**: Test Owner has **dual-phase responsibilities** in the overall workflow, ensuring role isolation from Coder.
+> **Core Principle**: Test Owner has **dual-phase responsibilities** in the overall workflow, achieving mental clarity through **mode labels** (not session isolation).
 ### My Position in the Overall Workflow
 ```
-proposal → design → [Test Owner Phase 1] → coder → [Test Owner Phase 2] → code-review → archive
-                         ↓                              ↓
-                    Red baseline output          Green verification + check off
+proposal → design → [TEST-OWNER] → [CODER] → [TEST-OWNER] → code-review → archive
+                         ↓              ↓           ↓
+                    Red baseline    Implement    Evidence audit
+                   (incremental)   (@smoke)     (no @full rerun)
 ```
+### AI Era Solo Development Optimization
+> **Important Change**: This protocol is optimized for AI programming + solo development scenarios, **removing the mandatory "separate session" requirement**.
+| Old Design | New Design | Reason |
+|------------|------------|--------|
+| Test Owner and Coder must use separate sessions | Same session, switch with `[TEST-OWNER]` / `[CODER]` mode labels | Reduce context rebuilding cost |
+| Phase 2 reruns full tests | Phase 2 defaults to **evidence audit**, optional sampling rerun | Avoid slow test multiple runs |
+| No test layering requirement | Mandatory test layering: `@smoke`/`@critical`/`@full` | Fast feedback loop |
 ### Test Owner's Dual-Phase Responsibilities
-| Phase | Trigger | Core Responsibility | Output |
-|-------|---------|---------------------|--------|
-| **Phase 1: Red Baseline** | After design.md is complete | Write tests, produce failure evidence | verification.md (Status=Ready), Red baseline |
-| **Phase 2: Green Verification** | After Coder completes | Verify tests pass, check off AC matrix | AC matrix checked, Status remains Ready (wait for Reviewer to set Done) |
+| Phase | Trigger | Core Responsibility | Test Run Method | Output |
+|-------|---------|---------------------|-----------------|--------|
+| **Phase 1: Red Baseline** | After design.md is complete | Write tests, produce failure evidence | Only run **incremental tests** (new/P0) | verification.md (Status=Ready), Red baseline |
+| **Phase 2: Green Verification** | After Coder completes + @full passes | **Audit evidence**, check off AC matrix | Default no rerun, optional sampling | AC matrix checked, Status=Verified |
 ### Phase 2 Detailed Responsibilities (Critical!)
 When user says "Coder is done, please verify" or similar, Test Owner enters **Phase 2**:
-1. **Run all tests**: Execute `npm test` or project test command
-2. **Verify Green status**: Confirm all tests pass
-3. **Check off AC Coverage Matrix**: Change `[ ]` to `[x]` in verification.md AC Coverage Matrix
-4. **Collect Green evidence**: Save to `evidence/green-final/`
-5. **Output verification report**: Summarize test results and coverage
+1. **Check prerequisites**: Confirm @full tests have passed (check CI results or `evidence/green-final/`)
+2. **Audit evidence** (default mode):
+   - Check test logs in `evidence/green-final/` directory
+   - Verify commit hash matches current code
+   - Confirm tests cover all ACs
+3. **Optional sampling rerun**: Sample verify high-risk ACs or questionable tests
+4. **Check off AC Coverage Matrix**: Change `[ ]` to `[x]` in verification.md AC Coverage Matrix
+5. **Set status to Verified**: Indicates test verification passed, waiting for Code Review
 ### AC Coverage Matrix Checkbox Permissions (Important!)
 | Checkbox Location | Who Can Check | When to Check |
 |-------------------|---------------|---------------|
-| `[ ]` in AC Coverage Matrix | **Test Owner** | Phase 2 after verifying Green status |
+| `[ ]` in AC Coverage Matrix | **Test Owner** | Phase 2 after evidence audit confirmed |
+| Status field `Verified` | **Test Owner** | After Phase 2 completion |
 | Status field `Done` | Reviewer | After Code Review passes |
 **Prohibited**: Coder cannot check AC Coverage Matrix, cannot modify verification.md.
 ---
+## Test Layering and Run Strategy (Critical!)
+> **Core Principle**: Test layering is key to solving "slow tests blocking development".
+### Test Layering Labels (Must Use)
+| Label | Purpose | Who Runs | Expected Time | When to Run |
+|-------|---------|----------|---------------|-------------|
+| `@smoke` | Fast feedback, core paths | Coder runs frequently | Seconds | After each code change |
+| `@critical` | Key functionality verification | Coder before commit | Minutes | Before commit |
+| `@full` | Complete acceptance tests | CI runs async | Can be slow (hours) | Background/CI |
+### Test Run Strategy by Phase
+| Phase | What to Run | Purpose | Blocking/Async |
+|-------|-------------|---------|----------------|
+| **Test Owner Phase 1** | Only **newly written tests** | Confirm Red status | Sync (but incremental only) |
+| **Coder during dev** | `@smoke` | Fast feedback loop | Sync |
+| **Coder before commit** | `@critical` | Key path verification | Sync |
+| **Coder on completion** | `@full` (trigger CI) | Complete acceptance | **Async** (doesn't block dev) |
+| **Test Owner Phase 2** | **No run** (audit evidence) | Independent verification | N/A |
+### Async vs Sync Boundary (Critical!)
+```
+✅ Async: Dev iteration (Coder can start next change after completion, no waiting for @full)
+❌ Sync: Archive gate (Archive must wait for @full to pass)
+Timeline example:
+T1: Coder completes implementation, triggers @full async test → Status = Implementation Done
+T2: Coder can start next change (not blocked)
+T3: @full tests pass → Status = Ready for Phase 2
+T4: Test Owner audits evidence + checks off → Status = Verified
+T5: Code Review → Status = Done
+T6: Archive (at this point @full has definitely passed)
+```
+---
 ## Prerequisites: Configuration Discovery (Protocol-Agnostic)
 - `<truth-root>`: Current truth directory root
@@ -86,10 +140,15 @@ Test Owner must produce a structured `verification.md` that serves as both test
 |--------|---------|-------------|
 | `Draft` | Initial state | Auto-generated |
 | `Ready` | Test plan ready | **Test Owner** |
+| `Implementation Done` | Implementation complete, waiting for @full tests | **Coder** |
+| `Verified` | @full passed + evidence audit complete | **Test Owner** |
 | `Done` | Review passed | Reviewer (Test Owner/Coder prohibited) |
-| `Archived` | Archived | Spec Gardener |
+| `Archived` | Archived | Archiver |
-**Constraint**: After completing the test plan, Test Owner should set Status to `Ready`.
+**Key Constraints**:
+- `Verified` status requires @full tests to have passed
+- Only changes with `Verified` or `Done` status can be archived
+- Test Owner sets `Ready` after completing test plan, sets `Verified` after evidence audit
 ```markdown
 # Verification Plan: <change-id>
@@ -385,14 +444,14 @@ Test Owner has two phases, completion status varies by phase:
 | Current Phase | How to Determine | Next Step After Completion |
 |---------------|------------------|---------------------------|
-| **Phase 1** | verification.md doesn't exist or Red baseline not produced | → Coder |
-| **Phase 2** | User says "verify/check off" and Coder has completed | → Code Review |
+| **Phase 1** | verification.md doesn't exist or Red baseline not produced | → `[CODER]` mode |
+| **Phase 2** | User says "verify/check off" and @full tests have passed | → Code Review |
 ### Phase 1 Completion Status Classification (MECE)
 | Code | Status | Determination Criteria | Next Step |
 |:----:|--------|------------------------|-----------|
-| ✅ | PHASE1_COMPLETED | Red baseline produced, no deviations | `devbooks-coder` (separate session) |
+| ✅ | PHASE1_COMPLETED | Red baseline produced, no deviations | Switch to `[CODER]` mode |
 | ⚠️ | PHASE1_COMPLETED_WITH_DEVIATION | Red baseline produced, deviation-log has pending records | `devbooks-design-backport` |
 | ❌ | BLOCKED | Needs external input/decision | Record breakpoint, wait for user |
 | 💥 | FAILED | Test framework issues etc. | Fix and retry |
@@ -401,8 +460,9 @@ Test Owner has two phases, completion status varies by phase:
 | Code | Status | Determination Criteria | Next Step |
 |:----:|--------|------------------------|-----------|
-| ✅ | PHASE2_VERIFIED | Tests all green, AC matrix checked | `devbooks-code-review` |
-| ❌ | PHASE2_FAILED | Tests not passing | Notify Coder to fix, or HANDOFF |
+| ✅ | PHASE2_VERIFIED | Evidence audit passed, AC matrix checked | `devbooks-code-review` |
+| ⏳ | PHASE2_WAITING | @full tests still running | Wait for CI to complete |
+| ❌ | PHASE2_FAILED | @full tests not passing | Notify Coder to fix |
 | 🔄 | PHASE2_HANDOFF | Found issues with tests themselves | Fix tests then re-verify |
 ### Phase Determination Flow
@@ -421,11 +481,13 @@ Test Owner has two phases, completion status varies by phase:
    c. All above pass → PHASE1_COMPLETED
 3. Phase 2 status determination:
-   a. Run tests, check if all green
+   a. Check if @full tests have completed
+      → No: PHASE2_WAITING
+   b. Check if @full tests passed
       → No: PHASE2_FAILED
-   b. Check if tests themselves have issues
+   c. Check if tests themselves have issues
       → Yes: PHASE2_HANDOFF
-   c. All green and no issues → PHASE2_VERIFIED
+   d. Audit evidence, confirm coverage → PHASE2_VERIFIED
 ```
 ### Routing Output Template (Required)
@@ -437,11 +499,13 @@ After completing test-owner, **must** output in this format:
 **Phase**: Phase 1 (Red Baseline) / Phase 2 (Green Verification)
-**Status**: ✅ PHASE1_COMPLETED / ✅ PHASE2_VERIFIED / ⚠️ ... / ❌ ... / 💥 ...
+**Status**: ✅ PHASE1_COMPLETED / ✅ PHASE2_VERIFIED / ⏳ PHASE2_WAITING / ...
 **Red Baseline**: Produced / Not completed (Phase 1 only)
-**Green Verification**: All green / Has failures (Phase 2 only)
+**@full Tests**: Passed / Running / Failed (Phase 2 only)
+**Evidence Audit**: Completed / Pending (Phase 2 only)
 **AC Matrix**: Checked N/M / Not checked (Phase 2 only)
@@ -449,31 +513,29 @@ After completing test-owner, **must** output in this format:
 ## Next Step
-**Recommended**: `devbooks-xxx skill`
+**Recommended**: Switch to `[CODER]` mode / `devbooks-xxx skill`
 **Reason**: [specific reason]
-### How to invoke
-Run devbooks-xxx skill for change <change-id>
 ```
 ### Specific Routing Rules
 | My Status | Next Step | Reason |
 |-----------|-----------|--------|
-| PHASE1_COMPLETED | `devbooks-coder` (separate session) | Red baseline produced, Coder implements to make Green |
+| PHASE1_COMPLETED | Switch to `[CODER]` mode | Red baseline produced, Coder implements to make Green |
 | PHASE1_COMPLETED_WITH_DEVIATION | `devbooks-design-backport` | Backport design first, then hand to Coder |
-| PHASE2_VERIFIED | `devbooks-code-review` | Tests all green, can proceed to code review |
-| PHASE2_FAILED | Notify Coder | Tests not passing, need Coder to fix |
+| PHASE2_VERIFIED | `devbooks-code-review` | Evidence audit passed, can proceed to code review |
+| PHASE2_WAITING | Wait for CI | @full tests still running |
+| PHASE2_FAILED | Notify Coder to fix | Tests not passing, need Coder to fix |
 | PHASE2_HANDOFF | Fix tests | Tests themselves have issues, Test Owner fixes |
 | BLOCKED | Wait for user | Record breakpoint area |
 | FAILED | Fix and retry | Analyze failure reason |
 **Critical Constraints**:
-- **Role Isolation**: Coder must work in a **separate conversation/instance**
-- Test Owner and Coder cannot share the same session context
+- **Mode switching replaces session isolation**: Use `[TEST-OWNER]` / `[CODER]` labels to switch modes
 - If deviations exist, must design-backport first before handing to Coder
 - **Phase 2 AC matrix checking can only be done by Test Owner**
+- **Phase 2 can only check off after @full tests have passed**
 ---