dev-playbooks 1.4.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "dev-playbooks",
3
- "version": "1.4.0",
3
+ "version": "1.5.0",
4
4
  "description": "AI-powered spec-driven development workflow",
5
5
  "keywords": [
6
6
  "devbooks",
@@ -0,0 +1,394 @@
1
+ ---
2
+ name: devbooks-convergence-audit
3
+ description: devbooks-convergence-audit: Evaluates DevBooks workflow convergence using "evidence first, distrust declarations" principle, detects "Sisyphus anti-patterns" and "fake completion". Actively verifies rather than trusting document claims. Use when user says "evaluate convergence/check upgrade health/Sisyphus detection/workflow audit" etc.
4
+ allowed-tools:
5
+ - Glob
6
+ - Grep
7
+ - Read
8
+ - Bash
9
+ ---
10
+
11
+ # DevBooks: Convergence Audit
12
+
13
+ ## Core Principle: Anti-Deception Design
14
+
15
+ > **Golden Rule**: **Evidence > Declarations**. Never trust any assertions in documents; must confirm through verifiable evidence.
16
+
17
+ ### Scenarios Where AI Gets Deceived (Must Prevent)
18
+
19
+ | Deception Scenario | AI Wrong Behavior | Correct Behavior |
20
+ |--------------------|-------------------|------------------|
21
+ | Document says `Status: Done` | Believes it's complete | Verify: Are tests actually all green? Does evidence exist? |
22
+ | AC matrix all `[x]` | Believes full coverage | Verify: Does test file for each AC exist and pass? |
23
+ | Document says "tests passed" | Believes passed | Verify: Actually run tests or check CI log timestamps |
24
+ | `evidence/` directory exists | Believes evidence exists | Verify: Is directory non-empty? Is content valid test logs? |
25
+ | tasks.md all `[x]` | Believes implemented | Verify: Do corresponding code files exist with substance? |
26
+ | Commit message says "fixed" | Believes fixed | Verify: Did related tests change from red to green? |
27
+
28
+ ### Three Anti-Deception Principles
29
+
30
+ ```
31
+ 1. Distrust Declarations
32
+ - Any "complete/passed/covered" claims in documents are hypotheses to verify
33
+ - Default stance: Claims may be wrong, outdated, or optimistic
34
+
35
+ 2. Evidence First
36
+ - Code/test results are the only truth
37
+ - Log timestamps must be later than last code modification
38
+ - Empty directory/file = no evidence
39
+
40
+ 3. Cross Validation
41
+ - Declaration vs evidence: Check consistency
42
+ - Code vs tests: Check if they match
43
+ - Multiple documents: Check for contradictions
44
+ ```
45
+
46
+ ---
47
+
48
+ ## Verification Checklist (Execute Each)
49
+
50
+ ### Check 1: Status Field Truthfulness Verification
51
+
52
+ **Document Claim**: `verification.md` contains `Status: Done` or `Status: Verified`
53
+
54
+ **Verification Steps**:
55
+ ```bash
56
+ # 1. Check if verification.md exists
57
+ [[ -f "verification.md" ]] || echo "❌ verification.md does not exist"
58
+
59
+ # 2. Check if evidence/green-final/ has content
60
+ if [[ -z "$(ls -A evidence/green-final/ 2>/dev/null)" ]]; then
61
+ echo "❌ Status claims complete, but evidence/green-final/ is empty"
62
+ fi
63
+
64
+ # 3. Check if evidence timestamp is later than last code modification
65
+ code_mtime=$(stat -f %m src/ 2>/dev/null || stat -c %Y src/)
66
+ evidence_mtime=$(stat -f %m evidence/green-final/* 2>/dev/null | sort -n | tail -1)
67
+ if [[ $evidence_mtime -lt $code_mtime ]]; then
68
+ echo "❌ Evidence time is earlier than code modification, evidence may be stale"
69
+ fi
70
+ ```
71
+
72
+ **Deception Detection**:
73
+ - ⚠️ Status=Done but evidence/ empty → **Fake Completion**
74
+ - ⚠️ Status=Done but evidence timestamp too old → **Stale Evidence**
75
+ - ⚠️ Status=Done but tests actually fail → **False Status**
76
+
77
+ ---
78
+
79
+ ### Check 2: AC Coverage Matrix Truthfulness Verification
80
+
81
+ **Document Claim**: `[x]` in AC matrix means covered
82
+
83
+ **Verification Steps**:
84
+ ```bash
85
+ # 1. Extract all ACs claimed as covered
86
+ grep -E '^\| AC-[0-9]+.*\[x\]' verification.md | while read line; do
87
+ ac_id=$(echo "$line" | grep -oE 'AC-[0-9]+')
88
+ test_id=$(echo "$line" | grep -oE 'T-[0-9]+')
89
+
90
+ # 2. Verify corresponding test exists
91
+ if ! grep -rq "$test_id\|$ac_id" tests/; then
92
+ echo "❌ $ac_id claims covered, but no corresponding test found"
93
+ fi
94
+ done
95
+
96
+ # 3. Actually run tests to verify (most reliable)
97
+ npm test 2>&1 | tee /tmp/test-output.log
98
+ if grep -q "FAIL\|Error\|failed" /tmp/test-output.log; then
99
+ echo "❌ ACs claim full coverage, but tests actually have failures"
100
+ fi
101
+ ```
102
+
103
+ **Deception Detection**:
104
+ - ⚠️ AC checked but corresponding test file doesn't exist → **False Coverage**
105
+ - ⚠️ AC checked but test actually fails → **Fake Green**
106
+ - ⚠️ AC checked but test content is empty/placeholder → **Placeholder Test**
107
+
108
+ ---
109
+
110
+ ### Check 3: tasks.md Completion Truthfulness Verification
111
+
112
+ **Document Claim**: `[x]` in tasks.md means completed
113
+
114
+ **Verification Steps**:
115
+ ```bash
116
+ # 1. Extract all tasks claimed as complete
117
+ grep -E '^\- \[x\]' tasks.md | while read line; do
118
+ # 2. Extract keywords from task description (function name/file name/feature)
119
+ keywords=$(echo "$line" | grep -oE '[A-Za-z]+[A-Za-z0-9]*' | head -5)
120
+
121
+ # 3. Verify code has corresponding implementation
122
+ for kw in $keywords; do
123
+ if ! grep -rq "$kw" src/; then
124
+ echo "⚠️ Task claims complete, but keyword not found in code: $kw"
125
+ fi
126
+ done
127
+ done
128
+
129
+ # 4. Check for "skeleton code" (only function signatures without implementation)
130
+ grep -rE 'throw new Error\(.*not implemented|TODO|FIXME|pass$|\.\.\.}' src/ && \
131
+ echo "⚠️ Found unimplemented placeholder code"
132
+ ```
133
+
134
+ **Deception Detection**:
135
+ - ⚠️ Task checked but code doesn't exist → **False Completion**
136
+ - ⚠️ Task checked but code is placeholder → **Skeleton Code**
137
+ - ⚠️ Task checked but feature not callable → **Dead Code**
138
+
139
+ ---
140
+
141
+ ### Check 4: Evidence Validity Verification
142
+
143
+ **Document Claim**: `evidence/` directory contains test evidence
144
+
145
+ **Verification Steps**:
146
+ ```bash
147
+ # 1. Check if directory exists and is non-empty
148
+ if [[ ! -d "evidence" ]] || [[ -z "$(ls -A evidence/)" ]]; then
149
+ echo "❌ evidence/ does not exist or is empty"
150
+ exit 1
151
+ fi
152
+
153
+ # 2. Check if evidence files have substantial content
154
+ for f in evidence/**/*; do
155
+ if [[ -f "$f" ]]; then
156
+ lines=$(wc -l < "$f")
157
+ if [[ $lines -lt 5 ]]; then
158
+ echo "⚠️ Evidence file has too little content: $f ($lines lines)"
159
+ fi
160
+
161
+ # 3. Check if it's a valid test log (contains test framework output characteristics)
162
+ if ! grep -qE 'PASS|FAIL|✓|✗|passed|failed|test|spec' "$f"; then
163
+ echo "⚠️ Evidence file doesn't look like test log: $f"
164
+ fi
165
+ fi
166
+ done
167
+
168
+ # 4. Check if red-baseline evidence really is red (has failures)
169
+ if [[ -d "evidence/red-baseline" ]]; then
170
+ if ! grep -rqE 'FAIL|Error|✗|failed' evidence/red-baseline/; then
171
+ echo "❌ red-baseline claims to be red, but no failure records"
172
+ fi
173
+ fi
174
+
175
+ # 5. Check if green-final evidence really is green (all pass)
176
+ if [[ -d "evidence/green-final" ]]; then
177
+ if grep -rqE 'FAIL|Error|✗|failed' evidence/green-final/; then
178
+ echo "❌ green-final claims to be green, but contains failure records"
179
+ fi
180
+ fi
181
+ ```
182
+
183
+ **Deception Detection**:
184
+ - ⚠️ evidence/ exists but content is empty → **Empty Evidence**
185
+ - ⚠️ Evidence file too small (< 5 lines) → **Placeholder Evidence**
186
+ - ⚠️ red-baseline has no failure records → **Fake Red**
187
+ - ⚠️ green-final contains failure records → **Fake Green**
188
+
189
+ ---
190
+
191
+ ### Check 5: Git History Cross Validation
192
+
193
+ **Principle**: Git history doesn't lie; use it to verify document claims
194
+
195
+ **Verification Steps**:
196
+ ```bash
197
+ # 1. Check if claimed-complete change has corresponding code commits
198
+ change_id="xxx"
199
+ commits=$(git log --oneline --all --grep="$change_id" | wc -l)
200
+ if [[ $commits -eq 0 ]]; then
201
+ echo "❌ Change $change_id claims complete, but no related commits in git history"
202
+ fi
203
+
204
+ # 2. Check if test files were added after code (TDD violation detection)
205
+ for test_file in tests/**/*.test.*; do
206
+ test_added=$(git log --format=%at --follow -- "$test_file" | tail -1)
207
+ # Find corresponding source file
208
+ src_file=$(echo "$test_file" | sed 's/tests/src/' | sed 's/.test//')
209
+ if [[ -f "$src_file" ]]; then
210
+ src_added=$(git log --format=%at --follow -- "$src_file" | tail -1)
211
+ if [[ $test_added -gt $src_added ]]; then
212
+ echo "⚠️ Test added after code (non-TDD): $test_file"
213
+ fi
214
+ fi
215
+ done
216
+
217
+ # 3. Check for "one-time big commits" (may be bypassing process)
218
+ git log --oneline -20 | while read line; do
219
+ commit=$(echo "$line" | cut -d' ' -f1)
220
+ files_changed=$(git show --stat "$commit" | grep -E '[0-9]+ file' | grep -oE '[0-9]+' | head -1)
221
+ if [[ $files_changed -gt 20 ]]; then
222
+ echo "⚠️ Big commit detected: $commit modified $files_changed files, may bypass incremental verification"
223
+ fi
224
+ done
225
+ ```
226
+
227
+ **Deception Detection**:
228
+ - ⚠️ Claims complete but no git commits → **Fake Change**
229
+ - ⚠️ Tests added after code → **Retroactive Testing**
230
+ - ⚠️ Many files in one commit → **Bypassing Incremental Verification**
231
+
232
+ ---
233
+
234
+ ### Check 6: Live Test Run Verification (Most Reliable)
235
+
236
+ **Principle**: Don't trust any logs; actually run tests
237
+
238
+ **Verification Steps**:
239
+ ```bash
240
+ # 1. Run full tests
241
+ echo "=== Live Test Verification ==="
242
+ npm test 2>&1 | tee /tmp/live-test.log
243
+
244
+ # 2. Check results
245
+ if grep -qE 'FAIL|Error|failed' /tmp/live-test.log; then
246
+ echo "❌ Live tests failed, document claims not trustworthy"
247
+ grep -E 'FAIL|Error|failed' /tmp/live-test.log
248
+ else
249
+ echo "✅ Live tests passed"
250
+ fi
251
+
252
+ # 3. Compare live results with evidence files
253
+ if [[ -f "evidence/green-final/latest.log" ]]; then
254
+ live_pass=$(grep -c 'PASS\|✓\|passed' /tmp/live-test.log)
255
+ evidence_pass=$(grep -c 'PASS\|✓\|passed' evidence/green-final/latest.log)
256
+ if [[ $live_pass -ne $evidence_pass ]]; then
257
+ echo "⚠️ Live pass count ($live_pass) ≠ evidence pass count ($evidence_pass)"
258
+ fi
259
+ fi
260
+ ```
261
+
262
+ **Deception Detection**:
263
+ - ⚠️ Evidence says green but live run fails → **Stale Evidence/Fake Green**
264
+ - ⚠️ Live pass count differs from evidence → **Evidence Fabrication/Environment Difference**
265
+
266
+ ---
267
+
268
+ ## Composite Scoring Algorithm
269
+
270
+ ### Trustworthiness Score (0-100)
271
+
272
+ ```python
273
+ def calculate_trustworthiness(checks):
274
+ score = 100
275
+
276
+ # Critical issues (each -20 points)
277
+ critical = [
278
+ "Evidence empty",
279
+ "Live tests failed",
280
+ "Status claims complete but tests fail",
281
+ "green-final contains failure records"
282
+ ]
283
+
284
+ # Warning issues (each -10 points)
285
+ warnings = [
286
+ "Evidence timestamp too old",
287
+ "AC corresponding test doesn't exist",
288
+ "Placeholder code",
289
+ "Big commit detected"
290
+ ]
291
+
292
+ # Minor issues (each -5 points)
293
+ minor = [
294
+ "Tests added after code",
295
+ "Evidence file too small"
296
+ ]
297
+
298
+ for issue in checks.critical_issues:
299
+ score -= 20
300
+ for issue in checks.warnings:
301
+ score -= 10
302
+ for issue in checks.minor_issues:
303
+ score -= 5
304
+
305
+ return max(0, score)
306
+ ```
307
+
308
+ ### Convergence Determination
309
+
310
+ | Trustworthiness | Determination | Recommendation |
311
+ |-----------------|---------------|----------------|
312
+ | 90-100 | ✅ Trustworthy Convergence | Continue current process |
313
+ | 70-89 | ⚠️ Partially Trustworthy | Need supplementary verification |
314
+ | 50-69 | 🟠 Questionable | Need to rework some steps |
315
+ | < 50 | 🔴 Untrustworthy | Sisyphus trap, need comprehensive review |
316
+
317
+ ---
318
+
319
+ ## Output Format
320
+
321
+ ```markdown
322
+ # DevBooks Convergence Audit Report (Anti-Deception Edition)
323
+
324
+ ## Audit Principle
325
+ This report uses "evidence first, distrust declarations" principle. All conclusions are based on verifiable evidence, not document claims.
326
+
327
+ ## Declaration vs Evidence Comparison
328
+
329
+ | Check Item | Document Claim | Actual Verification | Conclusion |
330
+ |------------|----------------|---------------------|------------|
331
+ | Status | Done | Tests actually fail | ❌ Fake Completion |
332
+ | AC Coverage | 5/5 checked | 2 ACs have no corresponding tests | ❌ False Coverage |
333
+ | Test Status | All green | Live run 3 failures | ❌ Stale Evidence |
334
+ | tasks.md | 10/10 complete | 3 tasks have no code | ❌ False Completion |
335
+ | evidence/ | Exists | Non-empty, content valid | ✅ Valid |
336
+
337
+ ## Trustworthiness Score
338
+
339
+ **Total Score**: 45/100 🔴 Untrustworthy
340
+
341
+ **Deduction Details**:
342
+ - -20: Status=Done but live tests fail
343
+ - -20: ACs claim full coverage but 2 have no tests
344
+ - -10: tasks.md 3 tasks have no code
345
+ - -5: Evidence timestamp earlier than code modification
346
+
347
+ ## Deception Detection Results
348
+
349
+ ### 🔴 Detected Fake Completions
350
+ 1. `change-auth`: Status=Done, but `npm test` fails 3
351
+ 2. `fix-cache`: AC-003 checked, but `tests/cache.test.ts` doesn't exist
352
+
353
+ ### 🟡 Suspicious Items
354
+ 1. `refactor-api`: evidence/green-final/ timestamp 2 days earlier than last code commit
355
+ 2. `feature-login`: tasks.md all checked, but `src/login.ts` contains TODO
356
+
357
+ ## True Status Determination
358
+
359
+ | Change Package | Claimed Status | True Status | Gap |
360
+ |----------------|----------------|-------------|-----|
361
+ | change-auth | Done | Tests failing | 🔴 Severe |
362
+ | fix-cache | Verified | Coverage incomplete | 🟠 Medium |
363
+ | refactor-api | Ready | Evidence stale | 🟡 Minor |
364
+
365
+ ## Recommended Actions
366
+
367
+ ### Immediate Action
368
+ 1. Revert `change-auth` status to `In Progress`
369
+ 2. Add tests for `fix-cache` AC-003
370
+
371
+ ### Short-term Improvement
372
+ 1. Establish evidence timeliness check (evidence must be later than code)
373
+ 2. Force run corresponding tests before AC check-off
374
+
375
+ ### Process Improvement
376
+ 1. Prohibit manual Status modification; only allow auto-update after script verification
377
+ 2. Integrate convergence check in CI, block fake completion from merging
378
+ ```
379
+
380
+ ---
381
+
382
+ ## Completion Status
383
+
384
+ **Status**: ✅ AUDIT_COMPLETED
385
+
386
+ **Core Findings**:
387
+ - Document claim trustworthiness: X%
388
+ - Detected fake completions: N
389
+ - Changes needing rework: M
390
+
391
+ **Next Step**:
392
+ - Fake completion → Immediately revert status, re-verify
393
+ - Suspicious items → Supplement evidence or re-run tests
394
+ - Trustworthy items → Continue current process