dev-playbooks 1.3.0 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -14,16 +14,27 @@ allowed-tools:
|
|
|
14
14
|
|
|
15
15
|
## Workflow Position Awareness
|
|
16
16
|
|
|
17
|
-
> **Core Principle**: Coder executes after Test Owner Phase 1
|
|
17
|
+
> **Core Principle**: Coder executes after Test Owner Phase 1, achieving mental clarity through **mode labels** (not session isolation).
|
|
18
18
|
|
|
19
19
|
### My Position in the Overall Workflow
|
|
20
20
|
|
|
21
21
|
```
|
|
22
|
-
proposal → design →
|
|
23
|
-
|
|
24
|
-
|
|
22
|
+
proposal → design → [TEST-OWNER] → [CODER] → [TEST-OWNER] → code-review → archive
|
|
23
|
+
↓ ↓
|
|
24
|
+
Implement+fast track Evidence audit
|
|
25
|
+
(@smoke/@critical) (no @full rerun)
|
|
25
26
|
```
|
|
26
27
|
|
|
28
|
+
### AI Era Solo Development Optimization
|
|
29
|
+
|
|
30
|
+
> **Important Change**: This protocol is optimized for AI programming + solo development scenarios, **removing the mandatory "separate session" requirement**.
|
|
31
|
+
|
|
32
|
+
| Old Design | New Design | Reason |
|
|
33
|
+
|------------|------------|--------|
|
|
34
|
+
| Test Owner and Coder must use separate sessions | Same session, switch with `[TEST-OWNER]` / `[CODER]` mode labels | Reduce context rebuilding cost |
|
|
35
|
+
| Coder runs full tests and waits | Coder runs fast track (`@smoke`/`@critical`), `@full` triggered async | Fast iteration |
|
|
36
|
+
| Completion goes directly to Test Owner | Completion status is `Implementation Done`, wait for @full | Async doesn't block, archive is sync |
|
|
37
|
+
|
|
27
38
|
### Coder's Responsibility Boundaries
|
|
28
39
|
|
|
29
40
|
| Allowed | Prohibited |
|
|
@@ -31,18 +42,60 @@ proposal → design → test-owner(phase1) → [Coder] → test-owner(phase2)
|
|
|
31
42
|
| Modify `src/**` code | ❌ Modify `tests/**` |
|
|
32
43
|
| Check off `tasks.md` items | ❌ Modify `verification.md` |
|
|
33
44
|
| Record deviations to `deviation-log.md` | ❌ Check off AC coverage matrix |
|
|
34
|
-
| Run tests
|
|
45
|
+
| Run fast track tests (`@smoke`/`@critical`) | ❌ Set verification.md Status to Verified/Done |
|
|
46
|
+
| Trigger `@full` tests (CI/background) | ❌ Wait for @full completion (can start next change) |
|
|
35
47
|
|
|
36
48
|
### Flow After Coder Completes
|
|
37
49
|
|
|
38
|
-
1. **
|
|
39
|
-
2. **
|
|
40
|
-
3. **
|
|
41
|
-
4. **
|
|
42
|
-
|
|
43
|
-
- Test Owner
|
|
50
|
+
1. **Fast track tests green**: `@smoke` + `@critical` pass
|
|
51
|
+
2. **Trigger @full**: Commit code, CI starts running @full tests async
|
|
52
|
+
3. **Status change**: Set change status to `Implementation Done`
|
|
53
|
+
4. **Can start next change** (not blocked)
|
|
54
|
+
5. **Wait for @full results**:
|
|
55
|
+
- @full passes → Test Owner enters Phase 2 to audit evidence
|
|
56
|
+
- @full fails → Coder fixes
|
|
57
|
+
|
|
58
|
+
**Key Reminders**:
|
|
59
|
+
- After Coder completes, status is `Implementation Done`, **not directly to Code Review**
|
|
60
|
+
- Dev iteration is async (can start next change), but archive is sync (must wait for @full to pass)
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Test Layering and Run Strategy (Critical!)
|
|
65
|
+
|
|
66
|
+
> **Core Principle**: Coder only runs fast track tests, @full tests are triggered async, not blocking dev iteration.
|
|
67
|
+
|
|
68
|
+
### Test Layering Labels
|
|
69
|
+
|
|
70
|
+
| Label | Purpose | When Coder Runs | Expected Time |
|
|
71
|
+
|-------|---------|-----------------|---------------|
|
|
72
|
+
| `@smoke` | Fast feedback, core paths | After each code change | Seconds |
|
|
73
|
+
| `@critical` | Key functionality verification | Before commit | Minutes |
|
|
74
|
+
| `@full` | Complete acceptance tests | **Don't run**, trigger CI async | Can be slow |
|
|
44
75
|
|
|
45
|
-
|
|
76
|
+
### Coder's Test Run Strategy
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
# During development: frequently run @smoke
|
|
80
|
+
npm test -- --grep "@smoke"
|
|
81
|
+
|
|
82
|
+
# Before commit: run @critical
|
|
83
|
+
npm test -- --grep "@smoke|@critical"
|
|
84
|
+
|
|
85
|
+
# After commit: CI automatically runs @full (Coder doesn't wait)
|
|
86
|
+
git push # triggers CI
|
|
87
|
+
# → Coder can start next task
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Async vs Sync Boundary
|
|
91
|
+
|
|
92
|
+
| Action | Blocking/Async | Description |
|
|
93
|
+
|--------|----------------|-------------|
|
|
94
|
+
| `@smoke` tests | Sync | Run immediately after each change |
|
|
95
|
+
| `@critical` tests | Sync | Must pass before commit |
|
|
96
|
+
| `@full` tests | **Async** | CI runs in background, doesn't block Coder |
|
|
97
|
+
| Start next change | **Not blocked** | Coder can start immediately |
|
|
98
|
+
| Archive | **Blocked** | Must wait for @full to pass |
|
|
46
99
|
|
|
47
100
|
---
|
|
48
101
|
|
|
@@ -319,26 +372,29 @@ During implementation, you **must immediately** write to `deviation-log.md` in t
|
|
|
319
372
|
|
|
320
373
|
| Code | Status | Determination Criteria | Next Step |
|
|
321
374
|
|:----:|--------|------------------------|-----------|
|
|
322
|
-
| ✅ |
|
|
323
|
-
| ⚠️ |
|
|
324
|
-
| 🔄 | HANDOFF | Found test issues needing modification | `
|
|
375
|
+
| ✅ | IMPLEMENTATION_DONE | Fast track tests green, @full triggered, no deviations | Switch to `[TEST-OWNER]` wait for @full |
|
|
376
|
+
| ⚠️ | IMPLEMENTATION_DONE_WITH_DEVIATION | Fast track green, deviation-log has pending records | `devbooks-design-backport` |
|
|
377
|
+
| 🔄 | HANDOFF | Found test issues needing modification | Switch to `[TEST-OWNER]` mode to fix tests |
|
|
325
378
|
| ❌ | BLOCKED | Needs external input/decision | Record breakpoint, wait for user |
|
|
326
|
-
| 💥 | FAILED |
|
|
379
|
+
| 💥 | FAILED | Fast track tests not passing | Fix and retry |
|
|
327
380
|
|
|
328
381
|
### Status Determination Flow
|
|
329
382
|
|
|
330
383
|
```
|
|
331
384
|
1. Check if deviation-log.md has "| ❌" records
|
|
332
|
-
→ Yes:
|
|
385
|
+
→ Yes: IMPLEMENTATION_DONE_WITH_DEVIATION
|
|
333
386
|
|
|
334
387
|
2. Check if tests/ modification needed
|
|
335
|
-
→ Yes: HANDOFF to
|
|
388
|
+
→ Yes: HANDOFF to [TEST-OWNER] mode
|
|
389
|
+
|
|
390
|
+
3. Check if fast track tests (@smoke + @critical) all pass
|
|
391
|
+
→ No: FAILED
|
|
336
392
|
|
|
337
|
-
|
|
338
|
-
→ No: BLOCKED or
|
|
393
|
+
4. Check if tasks.md is fully completed
|
|
394
|
+
→ No: BLOCKED or continue implementation
|
|
339
395
|
|
|
340
|
-
|
|
341
|
-
→
|
|
396
|
+
5. All checks passed, trigger @full
|
|
397
|
+
→ IMPLEMENTATION_DONE
|
|
342
398
|
```
|
|
343
399
|
|
|
344
400
|
### Routing Output Template (Required)
|
|
@@ -348,37 +404,41 @@ After completing coder, you **must** output in this format:
|
|
|
348
404
|
```markdown
|
|
349
405
|
## Completion Status
|
|
350
406
|
|
|
351
|
-
**Status**: ✅
|
|
407
|
+
**Status**: ✅ IMPLEMENTATION_DONE / ⚠️ ... / 🔄 HANDOFF / ❌ BLOCKED / 💥 FAILED
|
|
352
408
|
|
|
353
409
|
**Task Progress**: X/Y completed
|
|
354
410
|
|
|
411
|
+
**Fast Track Tests**: @smoke ✅ / @critical ✅
|
|
412
|
+
|
|
413
|
+
**@full Tests**: Triggered (CI running async)
|
|
414
|
+
|
|
355
415
|
**Deviation Records**: Has N pending / None
|
|
356
416
|
|
|
357
417
|
## Next Step
|
|
358
418
|
|
|
359
|
-
**Recommended**: `devbooks-xxx skill`
|
|
419
|
+
**Recommended**: Switch to `[TEST-OWNER]` mode wait for @full / `devbooks-xxx skill`
|
|
360
420
|
|
|
361
421
|
**Reason**: [specific reason]
|
|
362
422
|
|
|
363
|
-
|
|
364
|
-
Run devbooks-xxx skill for change <change-id>
|
|
423
|
+
**Note**: Can start next change, no need to wait for @full completion
|
|
365
424
|
```
|
|
366
425
|
|
|
367
426
|
### Specific Routing Rules
|
|
368
427
|
|
|
369
428
|
| My Status | Next Step | Reason |
|
|
370
429
|
|-----------|-----------|--------|
|
|
371
|
-
|
|
|
372
|
-
|
|
|
373
|
-
| HANDOFF (test issue) | `
|
|
430
|
+
| IMPLEMENTATION_DONE | Switch to `[TEST-OWNER]` mode (wait for @full) | Fast track green, wait for @full to pass then audit evidence |
|
|
431
|
+
| IMPLEMENTATION_DONE_WITH_DEVIATION | `devbooks-design-backport` | Backport design first |
|
|
432
|
+
| HANDOFF (test issue) | Switch to `[TEST-OWNER]` mode | Coder cannot modify tests |
|
|
374
433
|
| BLOCKED | Wait for user | Record breakpoint area |
|
|
375
434
|
| FAILED | Fix and retry | Analyze failure reason |
|
|
376
435
|
|
|
377
436
|
**Critical Constraints**:
|
|
378
437
|
- Coder **can never modify** `tests/**`
|
|
379
|
-
- If test issues found, must
|
|
438
|
+
- If test issues found, must switch to `[TEST-OWNER]` mode to handle
|
|
380
439
|
- If deviations exist, must design-backport first before continuing
|
|
381
|
-
- **Coder must
|
|
440
|
+
- **Coder completion status is `Implementation Done`, must wait for @full to pass before entering Test Owner Phase 2**
|
|
441
|
+
- **Mode switching replaces session isolation**: Use `[TEST-OWNER]` / `[CODER]` labels to switch modes
|
|
382
442
|
|
|
383
443
|
---
|
|
384
444
|
|
|
@@ -0,0 +1,394 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: devbooks-convergence-audit
|
|
3
|
+
description: devbooks-convergence-audit: Evaluates DevBooks workflow convergence using "evidence first, distrust declarations" principle, detects "Sisyphus anti-patterns" and "fake completion". Actively verifies rather than trusting document claims. Use when user says "evaluate convergence/check upgrade health/Sisyphus detection/workflow audit" etc.
|
|
4
|
+
allowed-tools:
|
|
5
|
+
- Glob
|
|
6
|
+
- Grep
|
|
7
|
+
- Read
|
|
8
|
+
- Bash
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# DevBooks: Convergence Audit
|
|
12
|
+
|
|
13
|
+
## Core Principle: Anti-Deception Design
|
|
14
|
+
|
|
15
|
+
> **Golden Rule**: **Evidence > Declarations**. Never trust any assertions in documents; must confirm through verifiable evidence.
|
|
16
|
+
|
|
17
|
+
### Scenarios Where AI Gets Deceived (Must Prevent)
|
|
18
|
+
|
|
19
|
+
| Deception Scenario | AI Wrong Behavior | Correct Behavior |
|
|
20
|
+
|--------------------|-------------------|------------------|
|
|
21
|
+
| Document says `Status: Done` | Believes it's complete | Verify: Are tests actually all green? Does evidence exist? |
|
|
22
|
+
| AC matrix all `[x]` | Believes full coverage | Verify: Does test file for each AC exist and pass? |
|
|
23
|
+
| Document says "tests passed" | Believes passed | Verify: Actually run tests or check CI log timestamps |
|
|
24
|
+
| `evidence/` directory exists | Believes evidence exists | Verify: Is directory non-empty? Is content valid test logs? |
|
|
25
|
+
| tasks.md all `[x]` | Believes implemented | Verify: Do corresponding code files exist with substance? |
|
|
26
|
+
| Commit message says "fixed" | Believes fixed | Verify: Did related tests change from red to green? |
|
|
27
|
+
|
|
28
|
+
### Three Anti-Deception Principles
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
1. Distrust Declarations
|
|
32
|
+
- Any "complete/passed/covered" claims in documents are hypotheses to verify
|
|
33
|
+
- Default stance: Claims may be wrong, outdated, or optimistic
|
|
34
|
+
|
|
35
|
+
2. Evidence First
|
|
36
|
+
- Code/test results are the only truth
|
|
37
|
+
- Log timestamps must be later than last code modification
|
|
38
|
+
- Empty directory/file = no evidence
|
|
39
|
+
|
|
40
|
+
3. Cross Validation
|
|
41
|
+
- Declaration vs evidence: Check consistency
|
|
42
|
+
- Code vs tests: Check if they match
|
|
43
|
+
- Multiple documents: Check for contradictions
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Verification Checklist (Execute Each)
|
|
49
|
+
|
|
50
|
+
### Check 1: Status Field Truthfulness Verification
|
|
51
|
+
|
|
52
|
+
**Document Claim**: `verification.md` contains `Status: Done` or `Status: Verified`
|
|
53
|
+
|
|
54
|
+
**Verification Steps**:
|
|
55
|
+
```bash
|
|
56
|
+
# 1. Check if verification.md exists
|
|
57
|
+
[[ -f "verification.md" ]] || echo "❌ verification.md does not exist"
|
|
58
|
+
|
|
59
|
+
# 2. Check if evidence/green-final/ has content
|
|
60
|
+
if [[ -z "$(ls -A evidence/green-final/ 2>/dev/null)" ]]; then
|
|
61
|
+
echo "❌ Status claims complete, but evidence/green-final/ is empty"
|
|
62
|
+
fi
|
|
63
|
+
|
|
64
|
+
# 3. Check if evidence timestamp is later than last code modification
|
|
65
|
+
code_mtime=$(stat -f %m src/ 2>/dev/null || stat -c %Y src/)
|
|
66
|
+
evidence_mtime=$(stat -f %m evidence/green-final/* 2>/dev/null | sort -n | tail -1)
|
|
67
|
+
if [[ $evidence_mtime -lt $code_mtime ]]; then
|
|
68
|
+
echo "❌ Evidence time is earlier than code modification, evidence may be stale"
|
|
69
|
+
fi
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Deception Detection**:
|
|
73
|
+
- ⚠️ Status=Done but evidence/ empty → **Fake Completion**
|
|
74
|
+
- ⚠️ Status=Done but evidence timestamp too old → **Stale Evidence**
|
|
75
|
+
- ⚠️ Status=Done but tests actually fail → **False Status**
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
### Check 2: AC Coverage Matrix Truthfulness Verification
|
|
80
|
+
|
|
81
|
+
**Document Claim**: `[x]` in AC matrix means covered
|
|
82
|
+
|
|
83
|
+
**Verification Steps**:
|
|
84
|
+
```bash
|
|
85
|
+
# 1. Extract all ACs claimed as covered
|
|
86
|
+
grep -E '^\| AC-[0-9]+.*\[x\]' verification.md | while read line; do
|
|
87
|
+
ac_id=$(echo "$line" | grep -oE 'AC-[0-9]+')
|
|
88
|
+
test_id=$(echo "$line" | grep -oE 'T-[0-9]+')
|
|
89
|
+
|
|
90
|
+
# 2. Verify corresponding test exists
|
|
91
|
+
if ! grep -rq "$test_id\|$ac_id" tests/; then
|
|
92
|
+
echo "❌ $ac_id claims covered, but no corresponding test found"
|
|
93
|
+
fi
|
|
94
|
+
done
|
|
95
|
+
|
|
96
|
+
# 3. Actually run tests to verify (most reliable)
|
|
97
|
+
npm test 2>&1 | tee /tmp/test-output.log
|
|
98
|
+
if grep -q "FAIL\|Error\|failed" /tmp/test-output.log; then
|
|
99
|
+
echo "❌ ACs claim full coverage, but tests actually have failures"
|
|
100
|
+
fi
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
**Deception Detection**:
|
|
104
|
+
- ⚠️ AC checked but corresponding test file doesn't exist → **False Coverage**
|
|
105
|
+
- ⚠️ AC checked but test actually fails → **Fake Green**
|
|
106
|
+
- ⚠️ AC checked but test content is empty/placeholder → **Placeholder Test**
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
### Check 3: tasks.md Completion Truthfulness Verification
|
|
111
|
+
|
|
112
|
+
**Document Claim**: `[x]` in tasks.md means completed
|
|
113
|
+
|
|
114
|
+
**Verification Steps**:
|
|
115
|
+
```bash
|
|
116
|
+
# 1. Extract all tasks claimed as complete
|
|
117
|
+
grep -E '^\- \[x\]' tasks.md | while read line; do
|
|
118
|
+
# 2. Extract keywords from task description (function name/file name/feature)
|
|
119
|
+
keywords=$(echo "$line" | grep -oE '[A-Za-z]+[A-Za-z0-9]*' | head -5)
|
|
120
|
+
|
|
121
|
+
# 3. Verify code has corresponding implementation
|
|
122
|
+
for kw in $keywords; do
|
|
123
|
+
if ! grep -rq "$kw" src/; then
|
|
124
|
+
echo "⚠️ Task claims complete, but keyword not found in code: $kw"
|
|
125
|
+
fi
|
|
126
|
+
done
|
|
127
|
+
done
|
|
128
|
+
|
|
129
|
+
# 4. Check for "skeleton code" (only function signatures without implementation)
|
|
130
|
+
grep -rE 'throw new Error\(.*not implemented|TODO|FIXME|pass$|\.\.\.}' src/ && \
|
|
131
|
+
echo "⚠️ Found unimplemented placeholder code"
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**Deception Detection**:
|
|
135
|
+
- ⚠️ Task checked but code doesn't exist → **False Completion**
|
|
136
|
+
- ⚠️ Task checked but code is placeholder → **Skeleton Code**
|
|
137
|
+
- ⚠️ Task checked but feature not callable → **Dead Code**
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
### Check 4: Evidence Validity Verification
|
|
142
|
+
|
|
143
|
+
**Document Claim**: `evidence/` directory contains test evidence
|
|
144
|
+
|
|
145
|
+
**Verification Steps**:
|
|
146
|
+
```bash
|
|
147
|
+
# 1. Check if directory exists and is non-empty
|
|
148
|
+
if [[ ! -d "evidence" ]] || [[ -z "$(ls -A evidence/)" ]]; then
|
|
149
|
+
echo "❌ evidence/ does not exist or is empty"
|
|
150
|
+
exit 1
|
|
151
|
+
fi
|
|
152
|
+
|
|
153
|
+
# 2. Check if evidence files have substantial content
|
|
154
|
+
for f in evidence/**/*; do
|
|
155
|
+
if [[ -f "$f" ]]; then
|
|
156
|
+
lines=$(wc -l < "$f")
|
|
157
|
+
if [[ $lines -lt 5 ]]; then
|
|
158
|
+
echo "⚠️ Evidence file has too little content: $f ($lines lines)"
|
|
159
|
+
fi
|
|
160
|
+
|
|
161
|
+
# 3. Check if it's a valid test log (contains test framework output characteristics)
|
|
162
|
+
if ! grep -qE 'PASS|FAIL|✓|✗|passed|failed|test|spec' "$f"; then
|
|
163
|
+
echo "⚠️ Evidence file doesn't look like test log: $f"
|
|
164
|
+
fi
|
|
165
|
+
fi
|
|
166
|
+
done
|
|
167
|
+
|
|
168
|
+
# 4. Check if red-baseline evidence really is red (has failures)
|
|
169
|
+
if [[ -d "evidence/red-baseline" ]]; then
|
|
170
|
+
if ! grep -rqE 'FAIL|Error|✗|failed' evidence/red-baseline/; then
|
|
171
|
+
echo "❌ red-baseline claims to be red, but no failure records"
|
|
172
|
+
fi
|
|
173
|
+
fi
|
|
174
|
+
|
|
175
|
+
# 5. Check if green-final evidence really is green (all pass)
|
|
176
|
+
if [[ -d "evidence/green-final" ]]; then
|
|
177
|
+
if grep -rqE 'FAIL|Error|✗|failed' evidence/green-final/; then
|
|
178
|
+
echo "❌ green-final claims to be green, but contains failure records"
|
|
179
|
+
fi
|
|
180
|
+
fi
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
**Deception Detection**:
|
|
184
|
+
- ⚠️ evidence/ exists but content is empty → **Empty Evidence**
|
|
185
|
+
- ⚠️ Evidence file too small (< 5 lines) → **Placeholder Evidence**
|
|
186
|
+
- ⚠️ red-baseline has no failure records → **Fake Red**
|
|
187
|
+
- ⚠️ green-final contains failure records → **Fake Green**
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
### Check 5: Git History Cross Validation
|
|
192
|
+
|
|
193
|
+
**Principle**: Git history doesn't lie; use it to verify document claims
|
|
194
|
+
|
|
195
|
+
**Verification Steps**:
|
|
196
|
+
```bash
|
|
197
|
+
# 1. Check if claimed-complete change has corresponding code commits
|
|
198
|
+
change_id="xxx"
|
|
199
|
+
commits=$(git log --oneline --all --grep="$change_id" | wc -l)
|
|
200
|
+
if [[ $commits -eq 0 ]]; then
|
|
201
|
+
echo "❌ Change $change_id claims complete, but no related commits in git history"
|
|
202
|
+
fi
|
|
203
|
+
|
|
204
|
+
# 2. Check if test files were added after code (TDD violation detection)
|
|
205
|
+
for test_file in tests/**/*.test.*; do
|
|
206
|
+
test_added=$(git log --format=%at --follow -- "$test_file" | tail -1)
|
|
207
|
+
# Find corresponding source file
|
|
208
|
+
src_file=$(echo "$test_file" | sed 's/tests/src/' | sed 's/.test//')
|
|
209
|
+
if [[ -f "$src_file" ]]; then
|
|
210
|
+
src_added=$(git log --format=%at --follow -- "$src_file" | tail -1)
|
|
211
|
+
if [[ $test_added -gt $src_added ]]; then
|
|
212
|
+
echo "⚠️ Test added after code (non-TDD): $test_file"
|
|
213
|
+
fi
|
|
214
|
+
fi
|
|
215
|
+
done
|
|
216
|
+
|
|
217
|
+
# 3. Check for "one-time big commits" (may be bypassing process)
|
|
218
|
+
git log --oneline -20 | while read line; do
|
|
219
|
+
commit=$(echo "$line" | cut -d' ' -f1)
|
|
220
|
+
files_changed=$(git show --stat "$commit" | grep -E '[0-9]+ file' | grep -oE '[0-9]+' | head -1)
|
|
221
|
+
if [[ $files_changed -gt 20 ]]; then
|
|
222
|
+
echo "⚠️ Big commit detected: $commit modified $files_changed files, may bypass incremental verification"
|
|
223
|
+
fi
|
|
224
|
+
done
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
**Deception Detection**:
|
|
228
|
+
- ⚠️ Claims complete but no git commits → **Fake Change**
|
|
229
|
+
- ⚠️ Tests added after code → **Retroactive Testing**
|
|
230
|
+
- ⚠️ Many files in one commit → **Bypassing Incremental Verification**
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
### Check 6: Live Test Run Verification (Most Reliable)
|
|
235
|
+
|
|
236
|
+
**Principle**: Don't trust any logs; actually run tests
|
|
237
|
+
|
|
238
|
+
**Verification Steps**:
|
|
239
|
+
```bash
|
|
240
|
+
# 1. Run full tests
|
|
241
|
+
echo "=== Live Test Verification ==="
|
|
242
|
+
npm test 2>&1 | tee /tmp/live-test.log
|
|
243
|
+
|
|
244
|
+
# 2. Check results
|
|
245
|
+
if grep -qE 'FAIL|Error|failed' /tmp/live-test.log; then
|
|
246
|
+
echo "❌ Live tests failed, document claims not trustworthy"
|
|
247
|
+
grep -E 'FAIL|Error|failed' /tmp/live-test.log
|
|
248
|
+
else
|
|
249
|
+
echo "✅ Live tests passed"
|
|
250
|
+
fi
|
|
251
|
+
|
|
252
|
+
# 3. Compare live results with evidence files
|
|
253
|
+
if [[ -f "evidence/green-final/latest.log" ]]; then
|
|
254
|
+
live_pass=$(grep -c 'PASS\|✓\|passed' /tmp/live-test.log)
|
|
255
|
+
evidence_pass=$(grep -c 'PASS\|✓\|passed' evidence/green-final/latest.log)
|
|
256
|
+
if [[ $live_pass -ne $evidence_pass ]]; then
|
|
257
|
+
echo "⚠️ Live pass count ($live_pass) ≠ evidence pass count ($evidence_pass)"
|
|
258
|
+
fi
|
|
259
|
+
fi
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Deception Detection**:
|
|
263
|
+
- ⚠️ Evidence says green but live run fails → **Stale Evidence/Fake Green**
|
|
264
|
+
- ⚠️ Live pass count differs from evidence → **Evidence Fabrication/Environment Difference**
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
## Composite Scoring Algorithm
|
|
269
|
+
|
|
270
|
+
### Trustworthiness Score (0-100)
|
|
271
|
+
|
|
272
|
+
```python
|
|
273
|
+
def calculate_trustworthiness(checks):
|
|
274
|
+
score = 100
|
|
275
|
+
|
|
276
|
+
# Critical issues (each -20 points)
|
|
277
|
+
critical = [
|
|
278
|
+
"Evidence empty",
|
|
279
|
+
"Live tests failed",
|
|
280
|
+
"Status claims complete but tests fail",
|
|
281
|
+
"green-final contains failure records"
|
|
282
|
+
]
|
|
283
|
+
|
|
284
|
+
# Warning issues (each -10 points)
|
|
285
|
+
warnings = [
|
|
286
|
+
"Evidence timestamp too old",
|
|
287
|
+
"AC corresponding test doesn't exist",
|
|
288
|
+
"Placeholder code",
|
|
289
|
+
"Big commit detected"
|
|
290
|
+
]
|
|
291
|
+
|
|
292
|
+
# Minor issues (each -5 points)
|
|
293
|
+
minor = [
|
|
294
|
+
"Tests added after code",
|
|
295
|
+
"Evidence file too small"
|
|
296
|
+
]
|
|
297
|
+
|
|
298
|
+
for issue in checks.critical_issues:
|
|
299
|
+
score -= 20
|
|
300
|
+
for issue in checks.warnings:
|
|
301
|
+
score -= 10
|
|
302
|
+
for issue in checks.minor_issues:
|
|
303
|
+
score -= 5
|
|
304
|
+
|
|
305
|
+
return max(0, score)
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
### Convergence Determination
|
|
309
|
+
|
|
310
|
+
| Trustworthiness | Determination | Recommendation |
|
|
311
|
+
|-----------------|---------------|----------------|
|
|
312
|
+
| 90-100 | ✅ Trustworthy Convergence | Continue current process |
|
|
313
|
+
| 70-89 | ⚠️ Partially Trustworthy | Need supplementary verification |
|
|
314
|
+
| 50-69 | 🟠 Questionable | Need to rework some steps |
|
|
315
|
+
| < 50 | 🔴 Untrustworthy | Sisyphus trap, need comprehensive review |
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
|
|
319
|
+
## Output Format
|
|
320
|
+
|
|
321
|
+
```markdown
|
|
322
|
+
# DevBooks Convergence Audit Report (Anti-Deception Edition)
|
|
323
|
+
|
|
324
|
+
## Audit Principle
|
|
325
|
+
This report uses "evidence first, distrust declarations" principle. All conclusions are based on verifiable evidence, not document claims.
|
|
326
|
+
|
|
327
|
+
## Declaration vs Evidence Comparison
|
|
328
|
+
|
|
329
|
+
| Check Item | Document Claim | Actual Verification | Conclusion |
|
|
330
|
+
|------------|----------------|---------------------|------------|
|
|
331
|
+
| Status | Done | Tests actually fail | ❌ Fake Completion |
|
|
332
|
+
| AC Coverage | 5/5 checked | 2 ACs have no corresponding tests | ❌ False Coverage |
|
|
333
|
+
| Test Status | All green | Live run 3 failures | ❌ Stale Evidence |
|
|
334
|
+
| tasks.md | 10/10 complete | 3 tasks have no code | ❌ False Completion |
|
|
335
|
+
| evidence/ | Exists | Non-empty, content valid | ✅ Valid |
|
|
336
|
+
|
|
337
|
+
## Trustworthiness Score
|
|
338
|
+
|
|
339
|
+
**Total Score**: 45/100 🔴 Untrustworthy
|
|
340
|
+
|
|
341
|
+
**Deduction Details**:
|
|
342
|
+
- -20: Status=Done but live tests fail
|
|
343
|
+
- -20: ACs claim full coverage but 2 have no tests
|
|
344
|
+
- -10: tasks.md 3 tasks have no code
|
|
345
|
+
- -5: Evidence timestamp earlier than code modification
|
|
346
|
+
|
|
347
|
+
## Deception Detection Results
|
|
348
|
+
|
|
349
|
+
### 🔴 Detected Fake Completions
|
|
350
|
+
1. `change-auth`: Status=Done, but `npm test` fails 3
|
|
351
|
+
2. `fix-cache`: AC-003 checked, but `tests/cache.test.ts` doesn't exist
|
|
352
|
+
|
|
353
|
+
### 🟡 Suspicious Items
|
|
354
|
+
1. `refactor-api`: evidence/green-final/ timestamp 2 days earlier than last code commit
|
|
355
|
+
2. `feature-login`: tasks.md all checked, but `src/login.ts` contains TODO
|
|
356
|
+
|
|
357
|
+
## True Status Determination
|
|
358
|
+
|
|
359
|
+
| Change Package | Claimed Status | True Status | Gap |
|
|
360
|
+
|----------------|----------------|-------------|-----|
|
|
361
|
+
| change-auth | Done | Tests failing | 🔴 Severe |
|
|
362
|
+
| fix-cache | Verified | Coverage incomplete | 🟠 Medium |
|
|
363
|
+
| refactor-api | Ready | Evidence stale | 🟡 Minor |
|
|
364
|
+
|
|
365
|
+
## Recommended Actions
|
|
366
|
+
|
|
367
|
+
### Immediate Action
|
|
368
|
+
1. Revert `change-auth` status to `In Progress`
|
|
369
|
+
2. Add tests for `fix-cache` AC-003
|
|
370
|
+
|
|
371
|
+
### Short-term Improvement
|
|
372
|
+
1. Establish evidence timeliness check (evidence must be later than code)
|
|
373
|
+
2. Force run corresponding tests before AC check-off
|
|
374
|
+
|
|
375
|
+
### Process Improvement
|
|
376
|
+
1. Prohibit manual Status modification; only allow auto-update after script verification
|
|
377
|
+
2. Integrate convergence check in CI, block fake completion from merging
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
---
|
|
381
|
+
|
|
382
|
+
## Completion Status
|
|
383
|
+
|
|
384
|
+
**Status**: ✅ AUDIT_COMPLETED
|
|
385
|
+
|
|
386
|
+
**Core Findings**:
|
|
387
|
+
- Document claim trustworthiness: X%
|
|
388
|
+
- Detected fake completions: N
|
|
389
|
+
- Changes needing rework: M
|
|
390
|
+
|
|
391
|
+
**Next Step**:
|
|
392
|
+
- Fake completion → Immediately revert status, re-verify
|
|
393
|
+
- Suspicious items → Supplement evidence or re-run tests
|
|
394
|
+
- Trustworthy items → Continue current process
|
|
@@ -14,44 +14,98 @@ allowed-tools:
|
|
|
14
14
|
|
|
15
15
|
## Workflow Position Awareness
|
|
16
16
|
|
|
17
|
-
> **Core Principle**: Test Owner has **dual-phase responsibilities** in the overall workflow,
|
|
17
|
+
> **Core Principle**: Test Owner has **dual-phase responsibilities** in the overall workflow, achieving mental clarity through **mode labels** (not session isolation).
|
|
18
18
|
|
|
19
19
|
### My Position in the Overall Workflow
|
|
20
20
|
|
|
21
21
|
```
|
|
22
|
-
proposal → design → [
|
|
23
|
-
↓
|
|
24
|
-
Red baseline
|
|
22
|
+
proposal → design → [TEST-OWNER] → [CODER] → [TEST-OWNER] → code-review → archive
|
|
23
|
+
↓ ↓ ↓
|
|
24
|
+
Red baseline Implement Evidence audit
|
|
25
|
+
(incremental) (@smoke) (no @full rerun)
|
|
25
26
|
```
|
|
26
27
|
|
|
28
|
+
### AI Era Solo Development Optimization
|
|
29
|
+
|
|
30
|
+
> **Important Change**: This protocol is optimized for AI programming + solo development scenarios, **removing the mandatory "separate session" requirement**.
|
|
31
|
+
|
|
32
|
+
| Old Design | New Design | Reason |
|
|
33
|
+
|------------|------------|--------|
|
|
34
|
+
| Test Owner and Coder must use separate sessions | Same session, switch with `[TEST-OWNER]` / `[CODER]` mode labels | Reduce context rebuilding cost |
|
|
35
|
+
| Phase 2 reruns full tests | Phase 2 defaults to **evidence audit**, optional sampling rerun | Avoid slow test multiple runs |
|
|
36
|
+
| No test layering requirement | Mandatory test layering: `@smoke`/`@critical`/`@full` | Fast feedback loop |
|
|
37
|
+
|
|
27
38
|
### Test Owner's Dual-Phase Responsibilities
|
|
28
39
|
|
|
29
|
-
| Phase | Trigger | Core Responsibility | Output |
|
|
30
|
-
|
|
31
|
-
| **Phase 1: Red Baseline** | After design.md is complete | Write tests, produce failure evidence | verification.md (Status=Ready), Red baseline |
|
|
32
|
-
| **Phase 2: Green Verification** | After Coder completes |
|
|
40
|
+
| Phase | Trigger | Core Responsibility | Test Run Method | Output |
|
|
41
|
+
|-------|---------|---------------------|-----------------|--------|
|
|
42
|
+
| **Phase 1: Red Baseline** | After design.md is complete | Write tests, produce failure evidence | Only run **incremental tests** (new/P0) | verification.md (Status=Ready), Red baseline |
|
|
43
|
+
| **Phase 2: Green Verification** | After Coder completes + @full passes | **Audit evidence**, check off AC matrix | Default no rerun, optional sampling | AC matrix checked, Status=Verified |
|
|
33
44
|
|
|
34
45
|
### Phase 2 Detailed Responsibilities (Critical!)
|
|
35
46
|
|
|
36
47
|
When user says "Coder is done, please verify" or similar, Test Owner enters **Phase 2**:
|
|
37
48
|
|
|
38
|
-
1. **
|
|
39
|
-
2. **
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
49
|
+
1. **Check prerequisites**: Confirm @full tests have passed (check CI results or `evidence/green-final/`)
|
|
50
|
+
2. **Audit evidence** (default mode):
|
|
51
|
+
- Check test logs in `evidence/green-final/` directory
|
|
52
|
+
- Verify commit hash matches current code
|
|
53
|
+
- Confirm tests cover all ACs
|
|
54
|
+
3. **Optional sampling rerun**: Sample verify high-risk ACs or questionable tests
|
|
55
|
+
4. **Check off AC Coverage Matrix**: Change `[ ]` to `[x]` in verification.md AC Coverage Matrix
|
|
56
|
+
5. **Set status to Verified**: Indicates test verification passed, waiting for Code Review
|
|
43
57
|
|
|
44
58
|
### AC Coverage Matrix Checkbox Permissions (Important!)
|
|
45
59
|
|
|
46
60
|
| Checkbox Location | Who Can Check | When to Check |
|
|
47
61
|
|-------------------|---------------|---------------|
|
|
48
|
-
| `[ ]` in AC Coverage Matrix | **Test Owner** | Phase 2 after
|
|
62
|
+
| `[ ]` in AC Coverage Matrix | **Test Owner** | Phase 2 after evidence audit confirmed |
|
|
63
|
+
| Status field `Verified` | **Test Owner** | After Phase 2 completion |
|
|
49
64
|
| Status field `Done` | Reviewer | After Code Review passes |
|
|
50
65
|
|
|
51
66
|
**Prohibited**: Coder cannot check AC Coverage Matrix, cannot modify verification.md.
|
|
52
67
|
|
|
53
68
|
---
|
|
54
69
|
|
|
70
|
+
## Test Layering and Run Strategy (Critical!)
|
|
71
|
+
|
|
72
|
+
> **Core Principle**: Test layering is key to solving "slow tests blocking development".
|
|
73
|
+
|
|
74
|
+
### Test Layering Labels (Must Use)
|
|
75
|
+
|
|
76
|
+
| Label | Purpose | Who Runs | Expected Time | When to Run |
|
|
77
|
+
|-------|---------|----------|---------------|-------------|
|
|
78
|
+
| `@smoke` | Fast feedback, core paths | Coder runs frequently | Seconds | After each code change |
|
|
79
|
+
| `@critical` | Key functionality verification | Coder before commit | Minutes | Before commit |
|
|
80
|
+
| `@full` | Complete acceptance tests | CI runs async | Can be slow (hours) | Background/CI |
|
|
81
|
+
|
|
82
|
+
### Test Run Strategy by Phase
|
|
83
|
+
|
|
84
|
+
| Phase | What to Run | Purpose | Blocking/Async |
|
|
85
|
+
|-------|-------------|---------|----------------|
|
|
86
|
+
| **Test Owner Phase 1** | Only **newly written tests** | Confirm Red status | Sync (but incremental only) |
|
|
87
|
+
| **Coder during dev** | `@smoke` | Fast feedback loop | Sync |
|
|
88
|
+
| **Coder before commit** | `@critical` | Key path verification | Sync |
|
|
89
|
+
| **Coder on completion** | `@full` (trigger CI) | Complete acceptance | **Async** (doesn't block dev) |
|
|
90
|
+
| **Test Owner Phase 2** | **No run** (audit evidence) | Independent verification | N/A |
|
|
91
|
+
|
|
92
|
+
### Async vs Sync Boundary (Critical!)
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
✅ Async: Dev iteration (Coder can start next change after completion, no waiting for @full)
|
|
96
|
+
❌ Sync: Archive gate (Archive must wait for @full to pass)
|
|
97
|
+
|
|
98
|
+
Timeline example:
|
|
99
|
+
T1: Coder completes implementation, triggers @full async test → Status = Implementation Done
|
|
100
|
+
T2: Coder can start next change (not blocked)
|
|
101
|
+
T3: @full tests pass → Status = Ready for Phase 2
|
|
102
|
+
T4: Test Owner audits evidence + checks off → Status = Verified
|
|
103
|
+
T5: Code Review → Status = Done
|
|
104
|
+
T6: Archive (at this point @full has definitely passed)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
55
109
|
## Prerequisites: Configuration Discovery (Protocol-Agnostic)
|
|
56
110
|
|
|
57
111
|
- `<truth-root>`: Current truth directory root
|
|
@@ -86,10 +140,15 @@ Test Owner must produce a structured `verification.md` that serves as both test
|
|
|
86
140
|
|--------|---------|-------------|
|
|
87
141
|
| `Draft` | Initial state | Auto-generated |
|
|
88
142
|
| `Ready` | Test plan ready | **Test Owner** |
|
|
143
|
+
| `Implementation Done` | Implementation complete, waiting for @full tests | **Coder** |
|
|
144
|
+
| `Verified` | @full passed + evidence audit complete | **Test Owner** |
|
|
89
145
|
| `Done` | Review passed | Reviewer (Test Owner/Coder prohibited) |
|
|
90
|
-
| `Archived` | Archived |
|
|
146
|
+
| `Archived` | Archived | Archiver |
|
|
91
147
|
|
|
92
|
-
**
|
|
148
|
+
**Key Constraints**:
|
|
149
|
+
- `Verified` status requires @full tests to have passed
|
|
150
|
+
- Only changes with `Verified` or `Done` status can be archived
|
|
151
|
+
- Test Owner sets `Ready` after completing test plan, sets `Verified` after evidence audit
|
|
93
152
|
|
|
94
153
|
```markdown
|
|
95
154
|
# Verification Plan: <change-id>
|
|
@@ -385,14 +444,14 @@ Test Owner has two phases, completion status varies by phase:
|
|
|
385
444
|
|
|
386
445
|
| Current Phase | How to Determine | Next Step After Completion |
|
|
387
446
|
|---------------|------------------|---------------------------|
|
|
388
|
-
| **Phase 1** | verification.md doesn't exist or Red baseline not produced | →
|
|
389
|
-
| **Phase 2** | User says "verify/check off" and
|
|
447
|
+
| **Phase 1** | verification.md doesn't exist or Red baseline not produced | → `[CODER]` mode |
|
|
448
|
+
| **Phase 2** | User says "verify/check off" and @full tests have passed | → Code Review |
|
|
390
449
|
|
|
391
450
|
### Phase 1 Completion Status Classification (MECE)
|
|
392
451
|
|
|
393
452
|
| Code | Status | Determination Criteria | Next Step |
|
|
394
453
|
|:----:|--------|------------------------|-----------|
|
|
395
|
-
| ✅ | PHASE1_COMPLETED | Red baseline produced, no deviations | `
|
|
454
|
+
| ✅ | PHASE1_COMPLETED | Red baseline produced, no deviations | Switch to `[CODER]` mode |
|
|
396
455
|
| ⚠️ | PHASE1_COMPLETED_WITH_DEVIATION | Red baseline produced, deviation-log has pending records | `devbooks-design-backport` |
|
|
397
456
|
| ❌ | BLOCKED | Needs external input/decision | Record breakpoint, wait for user |
|
|
398
457
|
| 💥 | FAILED | Test framework issues etc. | Fix and retry |
|
|
@@ -401,8 +460,9 @@ Test Owner has two phases, completion status varies by phase:
|
|
|
401
460
|
|
|
402
461
|
| Code | Status | Determination Criteria | Next Step |
|
|
403
462
|
|:----:|--------|------------------------|-----------|
|
|
404
|
-
| ✅ | PHASE2_VERIFIED |
|
|
405
|
-
|
|
|
463
|
+
| ✅ | PHASE2_VERIFIED | Evidence audit passed, AC matrix checked | `devbooks-code-review` |
|
|
464
|
+
| ⏳ | PHASE2_WAITING | @full tests still running | Wait for CI to complete |
|
|
465
|
+
| ❌ | PHASE2_FAILED | @full tests not passing | Notify Coder to fix |
|
|
406
466
|
| 🔄 | PHASE2_HANDOFF | Found issues with tests themselves | Fix tests then re-verify |
|
|
407
467
|
|
|
408
468
|
### Phase Determination Flow
|
|
@@ -421,11 +481,13 @@ Test Owner has two phases, completion status varies by phase:
|
|
|
421
481
|
c. All above pass → PHASE1_COMPLETED
|
|
422
482
|
|
|
423
483
|
3. Phase 2 status determination:
|
|
424
|
-
a.
|
|
484
|
+
a. Check if @full tests have completed
|
|
485
|
+
→ No: PHASE2_WAITING
|
|
486
|
+
b. Check if @full tests passed
|
|
425
487
|
→ No: PHASE2_FAILED
|
|
426
|
-
|
|
488
|
+
c. Check if tests themselves have issues
|
|
427
489
|
→ Yes: PHASE2_HANDOFF
|
|
428
|
-
|
|
490
|
+
d. Audit evidence, confirm coverage → PHASE2_VERIFIED
|
|
429
491
|
```
|
|
430
492
|
|
|
431
493
|
### Routing Output Template (Required)
|
|
@@ -437,11 +499,13 @@ After completing test-owner, **must** output in this format:
|
|
|
437
499
|
|
|
438
500
|
**Phase**: Phase 1 (Red Baseline) / Phase 2 (Green Verification)
|
|
439
501
|
|
|
440
|
-
**Status**: ✅ PHASE1_COMPLETED / ✅ PHASE2_VERIFIED /
|
|
502
|
+
**Status**: ✅ PHASE1_COMPLETED / ✅ PHASE2_VERIFIED / ⏳ PHASE2_WAITING / ...
|
|
441
503
|
|
|
442
504
|
**Red Baseline**: Produced / Not completed (Phase 1 only)
|
|
443
505
|
|
|
444
|
-
|
|
506
|
+
**@full Tests**: Passed / Running / Failed (Phase 2 only)
|
|
507
|
+
|
|
508
|
+
**Evidence Audit**: Completed / Pending (Phase 2 only)
|
|
445
509
|
|
|
446
510
|
**AC Matrix**: Checked N/M / Not checked (Phase 2 only)
|
|
447
511
|
|
|
@@ -449,31 +513,29 @@ After completing test-owner, **must** output in this format:
|
|
|
449
513
|
|
|
450
514
|
## Next Step
|
|
451
515
|
|
|
452
|
-
**Recommended**: `devbooks-xxx skill`
|
|
516
|
+
**Recommended**: Switch to `[CODER]` mode / `devbooks-xxx skill`
|
|
453
517
|
|
|
454
518
|
**Reason**: [specific reason]
|
|
455
|
-
|
|
456
|
-
### How to invoke
|
|
457
|
-
Run devbooks-xxx skill for change <change-id>
|
|
458
519
|
```
|
|
459
520
|
|
|
460
521
|
### Specific Routing Rules
|
|
461
522
|
|
|
462
523
|
| My Status | Next Step | Reason |
|
|
463
524
|
|-----------|-----------|--------|
|
|
464
|
-
| PHASE1_COMPLETED | `
|
|
525
|
+
| PHASE1_COMPLETED | Switch to `[CODER]` mode | Red baseline produced, Coder implements to make Green |
|
|
465
526
|
| PHASE1_COMPLETED_WITH_DEVIATION | `devbooks-design-backport` | Backport design first, then hand to Coder |
|
|
466
|
-
| PHASE2_VERIFIED | `devbooks-code-review` |
|
|
467
|
-
|
|
|
527
|
+
| PHASE2_VERIFIED | `devbooks-code-review` | Evidence audit passed, can proceed to code review |
|
|
528
|
+
| PHASE2_WAITING | Wait for CI | @full tests still running |
|
|
529
|
+
| PHASE2_FAILED | Notify Coder to fix | Tests not passing, need Coder to fix |
|
|
468
530
|
| PHASE2_HANDOFF | Fix tests | Tests themselves have issues, Test Owner fixes |
|
|
469
531
|
| BLOCKED | Wait for user | Record breakpoint area |
|
|
470
532
|
| FAILED | Fix and retry | Analyze failure reason |
|
|
471
533
|
|
|
472
534
|
**Critical Constraints**:
|
|
473
|
-
- **
|
|
474
|
-
- Test Owner and Coder cannot share the same session context
|
|
535
|
+
- **Mode switching replaces session isolation**: Use `[TEST-OWNER]` / `[CODER]` labels to switch modes
|
|
475
536
|
- If deviations exist, must design-backport first before handing to Coder
|
|
476
537
|
- **Phase 2 AC matrix checking can only be done by Test Owner**
|
|
538
|
+
- **Phase 2 can only check off after @full tests have passed**
|
|
477
539
|
|
|
478
540
|
---
|
|
479
541
|
|