xtrm-tools 0.7.17 → 0.7.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/.xtrm/config/hooks.json +2 -0
  2. package/.xtrm/config/instructions/agents-top.md +2 -1
  3. package/.xtrm/registry.json +429 -712
  4. package/.xtrm/skills/default/creating-service-skills/scripts/bootstrap.py +82 -156
  5. package/.xtrm/skills/default/creating-service-skills/scripts/scaffolder.py +73 -121
  6. package/.xtrm/skills/default/hook-development/references/patterns.md +1 -1
  7. package/.xtrm/skills/default/last30days/scripts/test-v1-vs-v2.sh +2 -2
  8. package/.xtrm/skills/default/planning/SKILL.md +75 -29
  9. package/.xtrm/skills/default/releasing/SKILL.md +163 -57
  10. package/.xtrm/skills/default/security-pipeline/SKILL.md +192 -0
  11. package/.xtrm/skills/default/security-pipeline/scripts/security-bootstrap.sh +294 -0
  12. package/.xtrm/skills/default/security-pipeline/templates/.githooks/pre-push.template +39 -0
  13. package/.xtrm/skills/default/security-pipeline/templates/.github/workflows/gitleaks.yml +33 -0
  14. package/.xtrm/skills/default/security-pipeline/templates/.github/workflows/osv-scanner.yml +33 -0
  15. package/.xtrm/skills/default/security-pipeline/templates/.github/workflows/semgrep.yml +41 -0
  16. package/.xtrm/skills/default/security-pipeline/templates/.gitleaks.toml +44 -0
  17. package/.xtrm/skills/default/security-pipeline/templates/.pre-commit-config.yaml +67 -0
  18. package/.xtrm/skills/default/security-pipeline/templates/.semgrepignore +46 -0
  19. package/.xtrm/skills/default/security-pipeline/templates/scripts/security-scan.sh +57 -0
  20. package/.xtrm/skills/default/security-pipeline/templates/scripts/semgrep-diff.sh +68 -0
  21. package/.xtrm/skills/default/session-close-report/SKILL.md +167 -6
  22. package/.xtrm/skills/default/sync-docs/SKILL.md +1 -1
  23. package/.xtrm/skills/default/update-xt/SKILL.md +270 -4
  24. package/.xtrm/skills/default/updating-service-skills/scripts/drift_detector.py +22 -0
  25. package/.xtrm/skills/default/using-script-specialists/SKILL.md +7 -5
  26. package/.xtrm/skills/default/using-specialists/SKILL.md +13 -12
  27. package/.xtrm/skills/default/using-specialists-auto/SKILL.md +137 -0
  28. package/.xtrm/skills/default/using-specialists-v2/SKILL.md +14 -21
  29. package/.xtrm/skills/default/using-specialists-v3/SKILL.md +533 -21
  30. package/.xtrm/skills/default/vaultctl/SKILL.md +2 -2
  31. package/CHANGELOG.md +82 -3
  32. package/cli/dist/index.cjs +12425 -3770
  33. package/cli/dist/index.cjs.map +1 -1
  34. package/cli/package.json +9 -3
  35. package/package.json +27 -7
  36. package/packages/pi-extensions/package.json +1 -1
  37. package/.xtrm/skills/default/planning/evals/evals.json +0 -19
  38. package/.xtrm/skills/default/quality-gates/evals/evals.json +0 -181
  39. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/FINAL-EVAL-SUMMARY.md +0 -75
  40. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/edge-case-auto-fix-verification/with_skill/outputs/response.md +0 -59
  41. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/edge-case-mixed-language-project/with_skill/outputs/response.md +0 -60
  42. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/eval-summary.md +0 -105
  43. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/partial-install-python-only/with_skill/outputs/response.md +0 -93
  44. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/python-refactor-request/with_skill/outputs/response.md +0 -104
  45. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/quality-gate-error-fix/with_skill/outputs/response.md +0 -74
  46. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/should-not-trigger-general-chat/with_skill/outputs/response.md +0 -18
  47. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/should-not-trigger-math-question/with_skill/outputs/response.md +0 -18
  48. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/should-not-trigger-unrelated-coding/with_skill/outputs/response.md +0 -56
  49. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/tdd-guard-blocking-confusion/with_skill/outputs/response.md +0 -67
  50. package/.xtrm/skills/default/quality-gates/workspace/iteration-1/typescript-feature-with-tests/with_skill/outputs/response.md +0 -97
  51. package/.xtrm/skills/default/sync-docs/evals/evals.json +0 -89
  52. package/.xtrm/skills/default/test-planning/evals/evals.json +0 -23
  53. package/.xtrm/skills/default/using-specialists/SKILL.safe.md +0 -1082
  54. package/.xtrm/skills/default/using-specialists/SKILL.ultra.md +0 -1082
  55. package/.xtrm/skills/default/using-specialists/evals/evals.json +0 -68
  56. package/.xtrm/skills/default/using-specialists-v3/evals/evals.json +0 -89
  57. package/packages/pi-extensions/.serena/project.yml +0 -130
@@ -4,8 +4,10 @@ description: >
4
4
  Canonical specialist orchestration skill. Use proactively for substantial work
5
5
  that should be delegated, tracked, reviewed, fixed, tested, or merged through
6
6
  specialists: code review, debugging, implementation, planning, doc sync,
7
- security checks, multi-step chains, and questions about specialist workflow.
8
- version: 3.2
7
+ security checks, multi-step chains, integration-phase reconciliation,
8
+ debugger-restitch on conflicting chains, pre-dispatch conflict-cluster
9
+ mapping, test-failure-map epics, and questions about specialist workflow.
10
+ version: 3.3
9
11
  ---
10
12
 
11
13
  # Using Specialists v3
@@ -14,6 +16,163 @@ You are the orchestrator. Turn user intent into a strong bead contract, choose r
14
16
 
15
17
  Keep skill practical. Core behavior belongs here; volatile detail stays in live commands.
16
18
 
19
+ > **MANDATORY — Run on skill load and before every new substantial task or epic:**
20
+ > ```bash
21
+ > specialists list --full
22
+ > ```
23
+ > Do not rely on remembered roles, models, or permissions. The registry is the source of truth.
24
+ > Run it again before dispatching any new chain or starting any epic — specialists change between sessions.
25
+
26
+ ## Specialist File Locations
27
+
28
+ Specialists live in three layers. Know which layer you are reading or editing:
29
+
30
+ | Layer | Path | Purpose |
31
+ |-------|------|---------|
32
+ | Package (shipped) | `config/specialists/*.specialist.json` | Canonical role definitions; versioned with the repo |
33
+ | User override | `.specialists/user/*.specialist.json` | Per-project customizations; wins over package layer for same name |
34
+ | Default mirror | `.specialists/default/*.specialist.json` | Repo-managed mirror of package defaults; overrides package fallback |
35
+
36
+ The loader resolves in priority order: user → default-mirror → package. A same-name file in `.specialists/user/` fully replaces the package version for that specialist. When creating or editing a specialist, use `config/specialists/` for shipped roles and `.specialists/user/` for project-specific overrides. Never edit `.specialists/default/` by hand — it is managed by `update-specialists`.
37
+
38
+ `specialists list --full` shows the resolved set (which layer each specialist comes from) so you always know what will actually run.
39
+
40
+ ### Editing Specialist Fields: `sp edit` Is Required
41
+
42
+ Direct JSON editing is error-prone and bypasses schema validation. Use `sp edit` for all field changes — it validates dot-paths, handles array append/remove, and writes to the correct layer.
43
+
44
+ ```bash
45
+ # Read a field
46
+ sp edit executor --get specialist.execution.model
47
+
48
+ # Set a field (schema-validated)
49
+ sp edit executor specialist.execution.model <model-id>
50
+
51
+ # Set prompt.system or task_template from a file (required for multi-line content)
52
+ sp edit executor --set specialist.prompt.system _ --file ./my-system-prompt.txt
53
+
54
+ # Append or remove tags
55
+ sp edit executor --set specialist.metadata.tags review,security --append
56
+ sp edit executor --set specialist.metadata.tags old-tag --remove
57
+
58
+ # Apply a named preset (run sp edit --list-presets for current options)
59
+ sp edit executor --preset power
60
+ sp edit executor --preset cheap --dry-run # preview first
61
+
62
+ # Target a specific scope when name exists in multiple layers
63
+ sp edit executor --scope user --set specialist.execution.model <model-id>
64
+
65
+ # Bulk read across all specialists
66
+ sp edit --all --get specialist.execution.model
67
+ ```
68
+
69
+ **When `sp edit` is required vs. direct JSON edit:**
70
+ - Model, thinking level, timeout, tags, permission, description → always `sp edit`
71
+ - `prompt.system` or `task_template` longer than one line → `sp edit --file`
72
+ - Structural schema fields (execution flags, output_schema) → `sp edit` with dot-path
73
+ - Net-new specialist creation → `specialists-creator` skill, then `sp edit` for tuning
74
+ - Bulk cross-specialist reads → `sp edit --all --get <path>`
75
+ - Available presets → `sp edit --list-presets` (do not hardcode; varies by install)
76
+
77
+ ## Orchestration Discipline (Paranoid Mode)
78
+
79
+ You are an orchestrator, not a hero. Move slowly enough to be correct.
80
+
81
+ - Run `specialists list --full` and `sp help` again at the start of every new substantial task. Do not skip because "you remember." Roles, models, and flags drift between sessions.
82
+ - Re-read the bead before dispatch. If you cannot defend each contract field out loud, the bead is not ready.
83
+ - Never dispatch a chain you cannot describe end-to-end (which specialist, which bead, which workspace, which merge target).
84
+ - Verify worktree and job state before and after each dispatch with `sp ps` and `git worktree list`. Drift is silent until merge.
85
+ - Treat reviewer `PARTIAL` and code-sanity `FINDINGS` as mandatory fix loops, not advisory noise.
86
+ - When unsure, prefer extra explorer/debugger passes over an over-eager executor. Wrong code merged is more expensive than slow research.
87
+
88
+ ## Project-Specific Specialists
89
+
90
+ Users define their own specialists in `.specialists/user/*.specialist.json` to fit project shape (domain knowledge, language, framework, conventions). These override package defaults and may not match generic role descriptions.
91
+
92
+ - Always run `specialists list --full` to see the resolved set, including project-specific roles, before choosing.
93
+ - Read `sp help` and the specialist's description/tags to confirm fit. Do not assume a name maps to its package-default behavior — a `.specialists/user/` override may have a different prompt, model, or scope.
94
+ - Pick the project-specific specialist when its role matches the task shape. Do not fall back to a generic role just because it is more familiar.
95
+ - If the task does not match any project-specific role, use the package default and consider whether a new project-specific specialist would help (use `specialists-creator` skill).
96
+
97
+ ## Advisory Passes Are Part Of Every Chain
98
+
99
+ For any substantive diff, the chain shape is:
100
+
101
+ ```
102
+ executor → code-sanity (if smell) → security-auditor (if risk surface) → reviewer → merge
103
+ ```
104
+
105
+ Triggers:
106
+
107
+ - `code-sanity` — cheap simplification and type-safety screen. Run when diff smells overcomplicated, brittle, or duplicates logic. Output is advisory; reviewer still gates merge.
108
+ - `security-auditor` — scan-only risk surface review. Run when diff touches auth, secrets, input handling (user/network/file), dependency lockfiles, agent/MCP/config surfaces, or token-storage paths. Output is advisory; executor applies fixes.
109
+ - Both run with their own bead and `--job <exec-job>` so they enter the executor workspace.
110
+
111
+ Routing patterns (cross-referenced from Integration / Restitch / E2E sections):
112
+
113
+ - **Cherry-pick integration**: advisory passes run on the last executor job in each chain BEFORE the squash-commit step.
114
+ - **Debugger-restitch**: advisory passes run on the debugger's job AFTER the restitch turn, BEFORE reviewer.
115
+ - **E2E smoke phase**: security-auditor runs on the cumulative integrated diff if any landed chain touched a sensitive surface, BEFORE smoke completes.
116
+ - **Reviewer rebuttal**: code-sanity / security-auditor findings count as legitimate evidence to support or rebut a reviewer verdict.
117
+
118
+ Skipping triggers:
119
+
120
+ - Diff is purely additive (new files only, no existing-symbol modifications) → advisory passes optional; note new-file scope in the chain handoff.
121
+ - Test-only diffs (entirely under `test/` or `tests/`) → skip security-auditor by default; still run code-sanity if test logic is non-trivial.
122
+ - Anything else, skipping is an escalation event — never skip security-auditor on auth/secrets/input changes "because the diff looks small." Small diffs hide the worst regressions.
123
+
124
+ ## Monitoring Long-Running Jobs: Sleep Timers Are Mandatory
125
+
126
+ Specialists run async. You will lose the chain if you do not actively monitor it.
127
+
128
+ **Required pattern after every dispatch:**
129
+
130
+ ```bash
131
+ sp run <role> --bead <id> --background ... # dispatch
132
+ sleep 10 && sp ps # confirm started
133
+ ```
134
+
135
+ Then cycle sleeps based on average completion time per role, checking `sp ps` each cycle:
136
+
137
+ | Role | Typical duration | Initial sleep cycle |
138
+ |------|------------------|---------------------|
139
+ | sync-docs, changelog-keeper | 60–180s | `sleep 60` then `sleep 60` |
140
+ | code-sanity, security-auditor | 60–180s | `sleep 60` then `sleep 60` |
141
+ | reviewer | 90–240s | `sleep 90` then `sleep 60` |
142
+ | explorer, debugger, planner, overthinker | 120–300s | `sleep 120` then `sleep 90` |
143
+ | executor | 180–600s+ | `sleep 180` then `sleep 120` |
144
+ | test-runner | varies with suite | start at `sleep 120`, adjust |
145
+
146
+ Rules:
147
+ - After dispatch, **always** `sleep 10 && sp ps` first to confirm the job is `running`, not stuck in `queued` or already `failed`.
148
+ - Then sleep again per the table; check `sp ps` each cycle.
149
+ - Do not poll faster than every 30s after the initial check — it wastes context.
150
+ - When status flips to `completed`, run `sp result <job-id>` immediately to consume output before context grows.
151
+ - If a job exceeds 2× its typical duration without completing, inspect with `sp feed <job-id>` before assuming hang.
152
+
153
+ You are not "done" until every dispatched job is `completed` or `failed` and consumed.
154
+
155
+ ## Worktree Cleanup After Merge
156
+
157
+ `sp merge` and `sp epic merge` clean up automatically when they succeed. If you fall back to manual `git merge` (e.g., doc-only chains), you own cleanup.
158
+
159
+ After every merge, verify:
160
+
161
+ ```bash
162
+ git worktree list # any orphaned worktrees from this session?
163
+ sp ps # any leftover jobs?
164
+ git worktree prune # drop stale worktree metadata
165
+ ```
166
+
167
+ If a feature/epic worktree remains after merge, remove it explicitly:
168
+
169
+ ```bash
170
+ git worktree remove <path>
171
+ git branch -d <merged-branch> # only after confirming merged
172
+ ```
173
+
174
+ `sp ps` must have no active jobs and no unresolved terminal problems before session close. If it only shows old terminal history that you have intentionally acknowledged, run `sp clean --ps --dry-run` and then `sp clean --ps` to soft-hide those rows from the default dashboard. This does not delete SQLite history or change job status; use `sp ps --include-cleaned` or `sp ps --all` for audit visibility. Stale worktrees and stale jobs both block future dispatches via the stale-base guard.
175
+
17
176
  ## When To Delegate
18
177
 
19
178
  Use specialists for substantial work: codebase exploration, debugging, implementation, review, test execution, planning, documentation sync, security/config audit, release publication, and multi-chain epics.
@@ -34,6 +193,27 @@ Do small deterministic edits directly when scope is already obvious and delegati
34
193
  10. Specialists must not perform destructive or irreversible operations.
35
194
  11. Treat tests as evidence: classify failures as in-scope, pre-existing, or infrastructure before starting fix loop.
36
195
  12. Drive routine stages autonomously once task is clear. Escalate only for human judgment, destructive actions, repeated crashes, or reviewer `FAIL`.
196
+ 13. The orchestrator NEVER edits code directly. Conflict resolution, even mechanical, goes through a debugger or executor specialist. Manual conflict resolution always escalates to the operator.
197
+
198
+ ## Escalation Matrix
199
+
200
+ | Action | Default | Always escalate to operator |
201
+ |---|---|---|
202
+ | Code edit | Specialist only | (never orchestrator-direct) |
203
+ | Cherry-pick onto integration branch | Auto if non-overlapping | Conflict requiring manual edits |
204
+ | Manual conflict resolution | Never | Always |
205
+ | Force push | Never | Always |
206
+ | Branch delete | Never | Always |
207
+ | Stash pop where conflict expected | Auto | Stash conflict that destroys session-start state |
208
+ | `bd dolt fsck --revive-journal-with-data-loss` | Never | Always — explicit data-loss warning |
209
+ | `sp epic merge` | Auto if all children PASSed | Skip if any child reviewer-FAILed |
210
+ | Skip `code-sanity` on substantive diff | Auto-skip only on test-only or new-file-only diffs | Always escalate before skipping on multi-file production diff |
211
+ | Skip `security-auditor` on diff touching auth/secrets/input/agent-config | Never | Always — sensitive-surface diffs always get the pass |
212
+ | `sp stop <job>` | Auto when job is done/stale | Never on actively-running unless context blown |
213
+ | `git push origin <branch>` | Auto for chain branches | Force-push or delete-remote always |
214
+ | `npm publish` | Never | Always |
215
+ | Dependency bump | Auto for security-patch bumps | Major/minor bumps escalate |
216
+ | Config file schema-changing edit | Never | Always |
37
217
 
38
218
  ## Live Registry And Help
39
219
 
@@ -52,6 +232,9 @@ sp ps --help
52
232
  sp feed --help
53
233
  sp result --help
54
234
  sp resume --help
235
+ sp steer --help
236
+ sp stop --help
237
+ sp finalize --help
55
238
  sp merge --help
56
239
  sp epic --help
57
240
  ```
@@ -234,7 +417,7 @@ Run `specialists list` if you need live registry. Choose by task, not habit.
234
417
  | Multiple review perspectives | `parallel-review` | Critical diff needs independent review passes |
235
418
  | Test execution | `test-runner` | Need suites run and failures interpreted |
236
419
  | Docs audit/sync | `sync-docs` | Docs may be stale or need targeted synchronization |
237
- | External/live research | `researcher` | Current non-security library/docs/media lookup is needed |
420
+ | External/live research | `researcher` | Any library/API/framework/CLI question dispatch BEFORE answering from training data |
238
421
  | Specialist config | `specialists-creator` | Creating or changing specialist JSON/config |
239
422
  | Release publication | `changelog-keeper` | New tag is being cut |
240
423
 
@@ -248,9 +431,25 @@ Selection rules:
248
431
  - Executor, debugger, changelog-keeper, sync-docs, and test-runner should not carry mandatory `<thinking>` blocks. That bloats output without payoff and hides the real contract.
249
432
  - Executor does not own full test validation; use reviewer/test-runner for that phase.
250
433
  - Sync-docs is for audit/sync; executor is for heavy doc rewrites.
251
- - Researcher is for current external info, not repo archaeology.
434
+ - Researcher is for current external info, not repo archaeology. **Dispatch BEFORE answering any library/API/framework/CLI question from training data** — your knowledge is stale by months and APIs drift silently. The cost is one CLI call; the alternative is shipping wrong API usage.
252
435
  - Specialists-creator should precede specialist config/schema edits.
253
436
 
437
+ ## Bug Diagnosis Chain
438
+
439
+ For symptoms, errors, regressions, flakes, or failing tests where cause is unknown, start with diagnosis — not implementation. Do not dispatch executor while cause is unknown; executor is for clear implementation scope only.
440
+
441
+ Default chain:
442
+
443
+ 1. **test-runner** or **debugger** establishes a fast deterministic feedback loop. If no loop can be built, debugger reports the blocker — do not patch in the dark.
444
+ 2. **debugger** reproduces the symptom, writes 3–5 falsifiable hypotheses, and tests one variable at a time. Any temporary instrumentation must be tagged `[DEBUG-<id>]` and removed before completion.
445
+ 3. **debugger** applies the minimal root-cause fix on the fault line and verifies via targeted lint/typecheck plus the focused repro.
446
+ 4. **test-runner** reruns the original repro/regression command (full-suite validation is its job, not debugger's).
447
+ 5. **code-sanity** runs if the fix smells brittle, overcomplicated, or type-risky. **security-auditor** runs if the fix touches auth/session/secrets/input handling, dependency logic, or agent/MCP/hook config.
448
+ 6. **reviewer** gates the final diff against the bead contract.
449
+ 7. If no correct regression-test seam exists, route the architecture/testability finding to **overthinker** or **planner** — do not force a brittle test just to close the loop.
450
+
451
+ Explorer is useful before diagnosis only when no concrete symptom exists and architecture is unknown. For real bugs with a symptom, use debugger.
452
+
254
453
  ## Code-sanity
255
454
 
256
455
  Use code-sanity when diff smells overcomplicated, brittle, or type-risky, but not yet broken enough for debugger. Use it before final review when you want cheap simplification check without blocking merge.
@@ -326,6 +525,57 @@ epic
326
525
 
327
526
  What differs: orchestrator sees edge shape up front, so can pick sequential chain, fix loop, or multi-chain epic without graph drift.
328
527
 
528
+ ## Pre-Dispatch: Conflict Cluster Identification
529
+
530
+ Before dispatching N parallel chains, build the file-overlap matrix:
531
+
532
+ | Chain | Touches | Overlap with |
533
+ |-------|---------|--------------|
534
+ | chain-A | src/cli/update.ts | chain-B, chain-C |
535
+ | chain-B | src/cli/update.ts, src/cli/install.ts | chain-A, chain-C, chain-D |
536
+ | chain-C | src/cli/update.ts, src/cli/install.ts, src/cli/doctor.ts | chain-A, chain-B |
537
+
538
+ For each cluster of overlapping chains, choose **one** of:
539
+
540
+ 1. **Serial dispatch** — execute chains in dependency order, each waits for previous to land. Slowest but cleanest.
541
+ 2. **Unified bead** — collapse all chains into one bead/executor pass. Larger reviewer scope but no merge conflicts.
542
+ 3. **Parallel dispatch + debugger restitch at integration** — dispatch in parallel, plan for ~40% conflict rate (empirical), budget debugger-restitch passes during integration.
543
+
544
+ Default heuristic: if 3+ chains touch the same file, **serial-dispatch them**. Conflict-resolution time at integration usually exceeds the time saved by parallel dispatch.
545
+
546
+ ## Pre-Epic: Test-Failure-Map Pattern
547
+
548
+ Use when:
549
+ - A test suite shows ≥ ~5 failures and the operator says "fix all"
550
+ - The failures span multiple files / subsystems
551
+ - Root causes are not yet attributed per failure
552
+
553
+ ### Step-by-step
554
+
555
+ 1. **Run the suite once**, save the full log. Do not interpret yet.
556
+ 2. **File one mapping bead** (e.g., `test-runner: refresh <epic> failure map`) with contract:
557
+ - `PROBLEM:` exact command + exit status + raw failure count.
558
+ - `SUCCESS:` cluster table grouping every failure by **likely shared root cause and file scope**, plus recommended fix-chain order.
559
+ - `SCOPE:` the log file path + bounded test files involved.
560
+ - `CONSTRAINTS:` READ_ONLY, no source/test edits, no fix attempts.
561
+ 3. **Dispatch test-runner / explorer / debugger** for this bead READ_ONLY (or fill inline by reading the log).
562
+ 4. **Build the cluster table**: cluster name | files (counts) | representative error | root-cause hypothesis | likely-owner area | targeted validation command. Save in bead notes.
563
+ 5. **Plan fix chains** off the cluster table:
564
+ - One chain per cluster, file scopes disjoint where possible.
565
+ - Order by leverage (largest cluster first), then by simplicity.
566
+ - Debugger when root cause unclear; executor when bead constraint is concrete.
567
+ 6. **Save the topology insight as `bd remember`** — patterns about where a codebase's test fragility concentrates are reusable.
568
+
569
+ ### Why this beats dispatch-blind
570
+
571
+ When 34 failures collapsed under 5 clusters in one observed run, 56% of failures shared a single root cause. A blind parallel dispatch would have over-dispatched 19 fixes instead of 1. Net specialist spend ~3× higher without the mapping pass.
572
+
573
+ ### Failure modes to watch for
574
+
575
+ - Clusters that look shared but aren't — same error string in unrelated tests may hide different root causes. Confirm via stack traces, not error text alone.
576
+ - One cluster's fix introduces another's regression — each chain's VALIDATION must span all known-failing areas with "no regressions in other clusters."
577
+ - Pre-existing failures vs new regressions — name pre-existing failures explicitly in each chain's NON_GOALS so reviewers don't FAIL on them.
578
+
329
579
  ## Canonical Single-Chain Flow
330
580
 
331
581
  Use for one implementation branch.
@@ -360,7 +610,13 @@ bd dep add <review> <impl>
360
610
  specialists run reviewer --bead <review> --job <exec-job> --context-depth 3
361
611
  specialists result <review-job>
362
612
 
363
- # 6. Publish after reviewer PASS
613
+ # 6. Cascade-finalize the chain after reviewer PASS
614
+ # (auto-finalize fires automatically when reviewer PASS appears in
615
+ # streaming output. PASS delivered via `sp resume` does not stream —
616
+ # run sp finalize to close any waiting keep-alive members.)
617
+ sp finalize <review-job> # accepts any chain member; cascades to all waiting members
618
+
619
+ # 7. Publish
364
620
  sp merge <impl>
365
621
  bd close <task> --reason "Reviewer PASS; merged."
366
622
  ```
@@ -393,9 +649,14 @@ specialists run executor --bead <impl-b> --context-depth 3
393
649
  specialists run reviewer --bead <review-a> --job <exec-a-job> --context-depth 3
394
650
  specialists run reviewer --bead <review-b> --job <exec-b-job> --context-depth 3
395
651
 
652
+ # Per-chain cascade-finalize (only needed if PASS arrived via sp resume;
653
+ # auto-finalize handles the streaming case automatically)
654
+ sp finalize <review-a-job>
655
+ sp finalize <review-b-job>
656
+
396
657
  # Publish
397
- sp epic status <epic>
398
- sp epic merge <epic>
658
+ sp epic status <epic> # verify derived state shows merge_ready
659
+ sp epic merge <epic> # batch publish all chains in dependency order with tsc gate per merge
399
660
  ```
400
661
 
401
662
  Use `--epic <id>` when job belongs to epic but bead is not direct child. Avoid parallel executors on same file; sequence them or consolidate work.
@@ -412,8 +673,8 @@ optional code-sanity/security-auditor -> advisory findings
412
673
  reviewer -> PASS | PARTIAL | FAIL
413
674
  ```
414
675
 
415
- - `PASS`: verify expected commit/diff, then publish.
416
- - `PARTIAL`: resume same executor/debugger with exact findings, then re-review.
676
+ - `PASS`: verify expected commit/diff. If reviewer's PASS appeared in its streaming output, auto-finalize already closed the chain — go straight to `sp merge` / `sp epic merge`. If PASS arrived via `sp resume`, run `sp finalize <any-chain-job-id>` first to cascade-close any waiting keep-alive members, then publish.
677
+ - `PARTIAL`: resume same executor/debugger with exact findings, then re-review (`sp resume <reviewer-job>`).
417
678
  - `FAIL`: stop and decide whether to replace chain, re-scope bead, or ask operator if judgment is required.
418
679
 
419
680
  Prefer resume over new fix executor when original job is waiting and context is healthy:
@@ -444,9 +705,45 @@ Overthinker:
444
705
  - Chain position: before planner/executor when design uncertainty is high.
445
706
 
446
707
  Researcher:
447
- - Use for current external docs, package behavior, or ecosystem facts that repo cannot answer.
448
- - Bead shape: source list, question set, required citations.
449
- - Chain position: before executor when outside facts matter.
708
+ - Dispatch **BEFORE** answering any library/API/framework/CLI question from training data. Training is months stale; APIs change; cheap CLI lookups (`ctx7`, `deepwiki`, `ghgrep`) replace the guess.
709
+ - Use for: API syntax checks, config options, version migrations, library-specific debugging, "how do others implement X", recent releases, public repo internals.
710
+ - Anti-pattern to break: "I think Library X works like Y…" → instead dispatch researcher with the exact question. The cost (~30s, `openai-codex/gpt-5.4-mini` via tool mode) is far less than shipping wrong API usage.
711
+ - Bead shape: source list (which libraries/repos), question set, required citations (library ID or `npx ctx7 docs /org/project "..."` output).
712
+ - Chain position: before executor when outside facts matter; alongside explorer when a question mixes local code with external behavior.
713
+ - Keep-alive: ask follow-ups in the same job rather than re-dispatching — researcher stays in waiting state after each turn.
714
+
715
+ Three modes — researcher picks automatically based on bead shape; you write the bead, not the mode:
716
+
717
+ - **Targeted lookup** (most common): "How do I configure X in library Y v1.2?" / "What does Z.method() return now?" / "Are foo and bar still the canonical replacements for baz?" → researcher resolves library ID via `ctx7 library`, then `ctx7 docs /org/project "<intent-rich query>"`. For repo-specific internals (e.g. "How does Vite handle X internally?"), `deepwiki ask <owner/repo> "..."`.
718
+ - **Discovery**: "How do production codebases handle X?" / "Find good examples of pattern Y" / "What does the ecosystem do for Z?" → `ghgrep "<literal pattern>" --lang <langs> --repo <maybe>`, scan results, drill into the best repos with `deepwiki toc` + `deepwiki ask`.
719
+ - **Media / discussion-recency** (rare): YouTube transcripts, social-media trends. Triggers on URLs or "what are people saying about X right now". Researcher loads `last30days` skill on-demand for this — don't fold its setup into the bead.
720
+
721
+ ### Dispatch triggers — when the orchestrator should reach for researcher
722
+
723
+ Concrete agent thoughts that MUST be replaced with a researcher dispatch:
724
+
725
+ | Agent thought | Researcher bead |
726
+ |---|---|
727
+ | "I think `useEffect` cleanup works like…" | `ctx7 docs /facebook/react "useEffect cleanup with async operations"` |
728
+ | "Next.js app router middleware should be…" | `ctx7 docs /vercel/next.js "app router middleware patterns"` |
729
+ | "Let me check if `--target` is a valid flag for tool X" | `ctx7 docs /org/tool-x "--target flag"` or `tool-x --help` (orchestrator-side if it's installed) |
730
+ | "Production code probably handles X by…" | `ghgrep "<X-pattern>" --lang TypeScript --limit 5` then `deepwiki ask <best-repo> "<design question>"` |
731
+ | "Library Y added feature Z in v3 (I think)" | `ctx7 library <Y> "Z"` → `ctx7 docs /org/Y/<version> "Z"` to verify version + behavior |
732
+ | "Repo X's authentication architecture is…" | `deepwiki ask owner/X "How does the auth middleware work? What stores tokens? What controls expiry?"` |
733
+ | "Cross-library: do A and B compose like Z?" | `deepwiki ask repo-A repo-B "How do these interact for use-case Z?"` |
734
+
735
+ If you catch yourself making any of these claims without first dispatching researcher, you are about to ship stale information. Stop and dispatch.
736
+
737
+ ### Cost framing
738
+
739
+ Researcher runs on `openai-codex/gpt-5.4-mini` via tool mode, keep-alive. Typical turn: 20-40s wall clock, ~$0.005-0.02 per call. The cost of shipping a wrong API call (debugger turn + executor fix + reviewer re-run, or worse, production regression) is orders of magnitude higher. Default to dispatch.
740
+
741
+ ### What researcher does NOT do
742
+
743
+ - Local code mapping → use `explorer` (READ_ONLY, traces project code without external CLI cost).
744
+ - Bug root-cause when symptoms are local → use `debugger`.
745
+ - Reading internal docs already in this repo → use direct file read or `explorer`.
746
+ - Security audit of third-party packages → use `security-auditor`; researcher's job is the API surface, not the threat model.
450
747
 
451
748
  Test-runner:
452
749
  - Use when commands need to run and failures need classification, not fixes.
@@ -460,18 +757,45 @@ Sync-docs:
460
757
 
461
758
  What differs: orchestrator uses specialists beyond the common trio, so planning, diagnosis, research, tests, and docs do not collapse into executor work.
462
759
 
760
+ ## Specialist Rebuttal As Routine
761
+
762
+ Several specialists default to over-cautious verdicts when an evidence gate looks unsatisfied. The orchestrator's job is to challenge that verdict with cited evidence, not to accept it. Common rebuttal-worthy patterns:
763
+
764
+ ### Overthinker
765
+
766
+ - "Hold for operator decision" without specifying what decision is needed → push: "Cite file/line evidence for why this is a product decision rather than a mechanical resolution."
767
+ - "Close as superseded by X" without verification → push: "Read the current state of `<file>` and check whether feature Y from this bead is actually present."
768
+ - "Run separate small beads" or "run one big bead" without rationale → push: "Pick one and explain operationally — cost difference, conflict expectations, reviewer scope."
769
+
770
+ ### Reviewer
771
+
772
+ - "PARTIAL — missing `gitnexus_impact` evidence" on a test-only diff → rebut: "Diff is entirely under `test/` (N files). `gitnexus_impact` analyzes runtime call graphs; test fixture mocks have no callers in the production graph. Bead's impact-gate constraint is conditional on modifying a runtime entrypoint, which did not happen here."
773
+ - "PARTIAL — missing `gitnexus_impact`" on a small LOW-blast-radius production diff where executor used `gitnexus_detect_changes` instead → rebut: cite the executor's `impact_report.highest_risk: LOW`, the LOC count, single helper / single consumer scope. The reviewer prompt accepts `gitnexus_impact` OR `$gitnexus_summary` OR `gitnexus_detect_changes` OR LOW `impact_report` as evidence.
774
+ - "FAIL — full suite shows N+1 fails" where one is a known concurrent-run flake → rebut: rerun the suspect test in isolation, paste clean output, resume reviewer with "Isolated rerun: P/P. Re-evaluate."
775
+
776
+ ### General rule
777
+
778
+ Resume with explicit ammunition: file/line refs, exact rerun output, link to the bead memory documenting the rebuttal pattern. Don't argue from authority; argue from new evidence. **Findings from code-sanity / security-auditor are legitimate rebuttal evidence** — a clean code-sanity OK or a security-auditor "no findings" is concrete proof against a reviewer's "looks too complex" or "may have security risk" gate. Cite the advisory job id when rebutting on this axis.
779
+
780
+ **One rebuttal per reviewer is the limit.** Second FAIL after rebuttal means stop and report. After a successful rebuttal, save the rebuttal text to `bd remember "<key>"` so the next session inherits it.
781
+
463
782
  ## Monitoring And Steering
464
783
 
465
784
  Use `sp ps` for state and `sp result` for completed turns.
466
785
 
467
786
  ```bash
468
- sp ps
469
- sp ps <job-id>
470
- sp ps --bead <bead-id>
787
+ sp ps # active jobs + unresolved terminal problems
788
+ sp ps --active # active jobs only
789
+ sp ps --health # include detailed process tables
790
+ sp ps --include-terminal # include uncleaned terminal history
791
+ sp ps --include-cleaned # include rows hidden by sp clean --ps
792
+ sp ps --all # full audit view, including cleaned/dead/history
471
793
  sp feed <job-id>
472
794
  sp result <job-id>
473
795
  ```
474
796
 
797
+ Default `sp ps` is the actionable dashboard, not raw history. Error/cancelled terminal rows stay visible until an operator acknowledges them with `sp clean --ps`; cleaned rows remain in SQLite and are visible via `--include-cleaned`/`--all`.
798
+
475
799
  If job is running, use `sp feed`. If it is waiting, use `sp result` and decide whether to resume, review, merge, or stop. Avoid tight polling; sleep based on task size, then check once.
476
800
 
477
801
  Use `steer` for running jobs and `resume` for waiting jobs:
@@ -490,10 +814,57 @@ Context usage is an action signal when available:
490
814
 
491
815
  Raw token totals are not context percentages.
492
816
 
817
+ ### Long autonomous runs — dual-mechanism monitoring
818
+
819
+ For sessions where the operator is offline (overnight, async windows), use both:
820
+
821
+ 1. **Bash sleep timers per dispatch**, sized per role (see Monitoring Long-Running Jobs above). Bash sleep waits for an expected completion.
822
+ 2. **External cron loop** (Claude Code: `/loop 180s sp ps`) as a heartbeat at fixed cadence regardless of orchestrator's bash sleeps. Cron catches specialists that finished while the orchestrator was busy reading other results, and catches stalls.
823
+
824
+ The two complement: bash sleep waits for an expected completion; cron catches unexpected completions and stalls. Without the cron, the orchestrator can miss specialists that completed during a long bash poll cycle and waste turns re-polling.
825
+
826
+ ## Bead Lifecycle And Parallel Commit Ordering
827
+
828
+ The bd commit-gate is **project-wide**, not per-worktree. While **any** bead in the project is `in_progress`, **no** worktree can commit. Practical consequences for parallel-chain epics:
829
+
830
+ - You CAN dispatch two executors in parallel — they work in separate worktrees, no commit-time collision.
831
+ - But once executor A returns and executor B is still running, you CANNOT commit A's worktree until B's bead is closed (or vice versa).
832
+ - Workflow: close the finished chain's executor bead FIRST (memory-ack + `bd close`), THEN commit that chain's worktree, THEN wait on the other chain.
833
+ - This forces a serial-tail on the commit step. Plan for it: parallel-dispatch saves time on the *thinking* step, not the commit step.
834
+
835
+ If the commit-gate blocks unexpectedly mid-orchestration, `bd query "status=in_progress"` reveals which claim is holding it open.
836
+
837
+ ### Memory-gate batch close
838
+
839
+ `bd close` is blocked until `memory-acked:<id>` exists. For batch-closing many orchestrator-internal beads (sanity beads, reviewer beads, decomposition trackers), use:
840
+
841
+ ```bash
842
+ for id in <impl> <sanity?> <review>; do
843
+ bd kv set "memory-acked:$id" "saved:<chain-memory-key>" # OR "nothing novel: <reason>"
844
+ done
845
+ bd close <impl> <sanity?> <review> <parent> --reason "..."
846
+ ```
847
+
848
+ The chain memory key holds the actual durable insight (one per real fix). Sanity/review beads get "nothing novel" — the parent insight covers them.
849
+
493
850
  ## What Stays Out
494
851
 
495
852
  - `memory-processor` — memory synthesis specialist; see `/documenting`.
496
853
  - `xt-merge`: deferred to xt-merge skill; this skill names specialist flow, not merge-wrapper internals.
854
+ - Session-close reporting (report skeleton, CHANGELOG sync, push) — see `/session-close-report` skill; this skill mandates running it at session end but does not duplicate its content.
855
+ - Release publication (version bump, build, tag, npm publish) — see `/releasing` skill.
856
+
857
+ ## At Session End — Mandatory Handoff
858
+
859
+ Before declaring the session done:
860
+
861
+ 1. Run the `/session-close-report` skill.
862
+ 2. Fill every `<!-- FILL -->` marker in the generated skeleton.
863
+ 3. Sync `CHANGELOG.md` for user-facing changes (the report skill drives this).
864
+ 4. Re-run cleanup checks: `sp ps`, `git worktree list`, `ps -ef` for stale serena/gitnexus, `tmux ls` for `sp-*`.
865
+ 5. Commit the report (and CHANGELOG if updated) before push.
866
+
867
+ A session that lands code but skips the close-report leaves the next agent cold-starting blind. That cost compounds across sessions.
497
868
 
498
869
  ## Adjacent xt commands
499
870
 
@@ -511,28 +882,147 @@ Source: latest xt report + `xt --help`; keep commands here, not full CLI surface
511
882
 
512
883
  ## Merge And Publication
513
884
 
514
- Standalone chain:
885
+ Per-chain merge (works for standalone chains AND for any PASS chain inside an active epic):
515
886
 
516
887
  ```bash
517
888
  sp merge <chain-root-bead>
518
889
  ```
519
890
 
520
- Epic-owned chains:
891
+ Batch publish all chains in an epic in dependency order with tsc gate between each:
521
892
 
522
893
  ```bash
523
894
  sp epic status <epic-id>
524
895
  sp epic merge <epic-id>
525
896
  ```
526
897
 
898
+ Manual finalizer fallback when reviewer PASS arrived via resume (auto-finalize only fires on streaming output):
899
+
900
+ ```bash
901
+ sp finalize <any-chain-job-id> # cascades: closes ALL waiting keep-alive members of the chain
902
+ ```
903
+
527
904
  Rules:
528
905
 
529
906
  - Merge only after reviewer PASS unless operator explicitly accepts draft for follow-up work.
530
- - Use `sp epic merge` for unresolved epic chains; `sp merge` refuses those by design.
531
- - Do not manually `git merge` specialist branches.
532
- - If merge refuses because chain job is still `waiting`, consume result and either resume/stop/finalize that job deliberately.
907
+ - Per-chain `sp merge` is allowed for any PASS chain regardless of sibling-epic state. Use `sp epic merge` only when batching all epic chains together (atomic publish, topological order, tsc gate per merge).
908
+ - Do not manually `git merge` specialist branches — the redesign removed the conditions that previously forced manual fallback (sticky FAILED, inverted merge gates, missing PASS finalizer).
909
+ - If merge refuses because a chain job is still `waiting`, run `sp finalize <any-job-in-chain>` it cascades to close every waiting keep-alive member of that chain via `supervisor.finalizeWaitingJob()`.
910
+ - If a previous `sp epic merge` failed (rebase conflict, dirty worktree) and persisted a soft `failed` marker, the next attempt retries fresh — only `merged` and `abandoned` are truly terminal. Just clear the conflict source.
533
911
  - If merge reports dirty worktree, inspect that worktree. Revert generated noise only when clearly unrelated; otherwise ask or re-dispatch.
534
912
  - Run or confirm required gates before closing root bead or epic.
535
913
 
914
+ ## Integration Phase — Cherry-Pick Playbook
915
+
916
+ Use when `sp merge` / `sp epic merge` is not the right path: chains forked from a non-`origin/HEAD` baseline (pass `--target-branch` first if it's a known integration branch), operator wants visibility before publish, or multiple chains must land into an integration branch before main.
917
+
918
+ ### Step-by-step
919
+
920
+ 1. Stash uncommitted state on working branch: `git stash push -u -m "pre-integration"`.
921
+ 2. Create integration branch off the working branch: `git checkout -b integration/<date>-orchestrator`.
922
+ 3. For each non-overlapping chain (security/critical first, then test-baseline, then features):
923
+ - `git merge --squash <chain-branch>`
924
+ - Restore noise files (see "Chain noise filter checklist" below)
925
+ - **Advisory passes** before commit: if the staged diff smells overcomplicated/duplicative/type-risky, dispatch `code-sanity --job <last-exec-job-of-chain>`; if it touches auth/secrets/input/agent-config, dispatch `security-auditor --job <last-exec-job-of-chain>`. Apply findings or document why skipped.
926
+ - `git commit -m "<type>(<scope>): <summary> (<bead-id>)"` — one squash commit per chain.
927
+ 4. For each overlapping chain, switch to the **debugger-restitch** pattern (next section).
928
+ 5. After all chains land, run E2E smoke phase (below) before declaring done.
929
+ 6. Operator FF-merges integration → main when satisfied.
930
+
931
+ ### Chain noise filter checklist
932
+
933
+ `sp merge` ignores `.beads/` and `.xtrm/skills/active/**` via `MERGE_DIRTY_IGNORE_PREFIXES`. For manual cherry-pick / squash flows, additionally unstage these before committing:
934
+
935
+ - `.pi/npm` — accidentally created by xt commands inside worktrees
936
+ - `cli/pnpm-lock.yaml`, `cli/pnpm-workspace.yaml` — pnpm side-effects
937
+ - `AGENTS.md`, `CLAUDE.md` — gitnexus stat-refresh hook noise
938
+ - `.beads/issues.jsonl`, `.beads/interactions.jsonl` — bd state churn
939
+ - `.specialists/executor-result.md` — transient specialist output
940
+
941
+ ```bash
942
+ git restore --staged .beads .pi AGENTS.md CLAUDE.md
943
+ git checkout HEAD -- .beads AGENTS.md CLAUDE.md
944
+ rm -f .pi/npm
945
+ ```
946
+
947
+ If a chain commits its own `.beads` symlink (older bd-in-worktree behavior), `rm -f .beads` then `git checkout HEAD -- .beads` to restore the real directory.
948
+
949
+ ## Debugger-Restitch Pattern
950
+
951
+ When chain X conflicts with already-landed chain Y on shared files, raw `git cherry-pick` will revert Y's work. The debugger-restitch pattern preserves both, but only when the debugger gets an explicit "preserve already-landed work" contract.
952
+
953
+ 1. **Reopen X**: `bd reopen <X> --reason="integration stitch onto post-Y state"`.
954
+ 2. **Strengthen the bead contract** with these fields:
955
+ - `## CRITICAL CONSTRAINTS:` heading at the top.
956
+ - "Fork off `integration/<date>-orchestrator`. Verify with `git log integration/...$..HEAD` empty before any commits."
957
+ - List the symbols/lines from Y that MUST be preserved verbatim (with file paths).
958
+ - "ADD X's intent ON TOP" with a numbered list of the additions.
959
+ - "Reference original `feature/<X>-executor` for symbol shapes only — do NOT cherry-pick or merge. Re-implement on integration's current state."
960
+ - `## VALIDATION:` includes both Y's tests passing AND X's new tests passing.
961
+ - `## OUTPUT:` mandates a 5-line code excerpt showing both Y and X features coexisting.
962
+ 3. **Dispatch debugger** with `--force-stale-base` if X is an epic child:
963
+ ```bash
964
+ sp run debugger --bead <X> --force-stale-base --keep-alive --background
965
+ ```
966
+ 4. **Sanity check the result**: when debugger reports back:
967
+ ```bash
968
+ git log integration/<date>..feature/<X>-debugger --oneline
969
+ git diff integration/<date>...feature/<X>-debugger -- <key-files>
970
+ ```
971
+ Confirm the debugger's diff is **additive** — no reverts of Y's lines.
972
+ 5. **Advisory passes**: before landing the restitch, dispatch `code-sanity --job <debugger-job>` if the restitch added control-flow complexity, and `security-auditor --job <debugger-job>` if it touched a sensitive surface. Restitched diffs are higher-risk than fresh executor diffs because the debugger had to thread around already-landed work.
973
+ 6. **Land via FF or cherry-pick the named commit** (NOT the checkpoint commit). Look for the commit with the proper `<type>(<scope>):` message; ignore `checkpoint(debugger):` commits above it.
974
+ 7. **Verify tests** before marking done.
975
+
976
+ ### Failure mode to watch for
977
+
978
+ If the debugger forks off the OLD baseline (pre-Y) instead of integration, its commit will revert Y. Symptom: `git diff integration..feature/<X>-debugger -- <Y's-file>` shows DELETIONS of Y's symbols. Fix: resume the debugger with explicit "cd to a fresh worktree forked from `integration/<date>-orchestrator`" instruction. Re-verify with `git log integration..HEAD` empty.
979
+
980
+ ## E2E Smoke Phase
981
+
982
+ Run **every** npm script + entry point that any chain added or modified. The smoke phase is the only way to catch missed chains, false-positive CI gates, missing intermediate files, and runtime regressions invisible to unit tests.
983
+
984
+ ### Procedure
985
+
986
+ ```bash
987
+ # Build sanity
988
+ bun run build # or equivalent
989
+
990
+ # Test sanity — record PRE-baseline first
991
+ git checkout <baseline-branch>
992
+ bun test 2>&1 | tail -5 # record N failed / M passed
993
+
994
+ # Switch back and re-run
995
+ git checkout integration/<date>-orchestrator
996
+ bun test 2>&1 | tail -5 # MUST be ≥ baseline. Net regression is a stop-the-line.
997
+
998
+ # Run every check:* script the integration added
999
+ for s in $(jq -r '.scripts | keys[] | select(startswith("check:"))' package.json); do
1000
+ echo "=== $s ==="
1001
+ npm run "$s" 2>&1 | tail -10
1002
+ done
1003
+
1004
+ # Targeted unit tests for chains touching the same files
1005
+ bunx vitest run <chain-test-files>
1006
+ ```
1007
+
1008
+ For each smoke that fails, decide before continuing:
1009
+ - False positive (script flags itself) → file follow-up bead, document, continue
1010
+ - Missing dependency (vendor not run) → expected gate, document
1011
+ - Real regression → stop, dispatch debugger to fix, re-smoke
1012
+
1013
+ ### Cross-cutting security-auditor pass
1014
+
1015
+ If any landed chain in this integration touched auth, secrets, input handling, dependency lockfiles, or agent/MCP/config surfaces, dispatch one `security-auditor` on the cumulative integration diff BEFORE declaring smoke done:
1016
+
1017
+ ```bash
1018
+ git diff <baseline>..integration/<date>-orchestrator > /tmp/integration-diff.patch
1019
+ sp run security-auditor --bead <sec-bead> --context-depth 3 --background
1020
+ ```
1021
+
1022
+ Per-chain security-auditor passes catch chain-local risks; this cross-cutting pass catches interaction risks that only appear once all chains coexist (e.g. one chain weakens an input validator that another newly relies on). Skipping this on a sensitive-surface integration is an escalation event.
1023
+
1024
+ Record all smoke results in the session-close-report under a `## Smoke test results` table (see `/session-close-report` skill).
1025
+
536
1026
  ## Failure Recovery
537
1027
 
538
1028
  When something fails:
@@ -552,6 +1042,21 @@ Then choose one action:
552
1042
  - Escalate if human decision is needed.
553
1043
  - Replace specialist only if failure mode repeats.
554
1044
 
1045
+ ### Common failure patterns (and the canonical fix)
1046
+
1047
+ | Symptom | Cause | Fix |
1048
+ |---|---|---|
1049
+ | `sp merge` refuses with "non-terminal chain jobs" after reviewer PASS | Auto-finalize did not fire (PASS arrived via `sp resume`, not streaming) | `sp finalize <any-chain-job-id>` — cascades to close every waiting keep-alive member |
1050
+ | `sp epic merge` says epic is "in terminal state 'failed'" | Prior `sp epic merge` hit a transient error (rebase conflict, dirty worktree) and persisted a soft `failed` marker | Clear the original conflict source, then re-run `sp epic merge` — it retries fresh, only `merged`/`abandoned` truly block |
1051
+ | `sp epic merge` says "rebase failed: unstaged changes" in a worktree | bd auto-export or other tooling left uncommitted changes inside the worktree | `cd .worktrees/<bead>/<bead>-executor && git stash push -u -m epic-merge-prep`, then re-run from main repo |
1052
+ | `sp ps` shows old terminal jobs after a session | Default dashboard keeps unresolved terminal problems visible until acknowledged | `sp clean --ps --dry-run`, then `sp clean --ps` to soft-hide from default ps; use `sp ps --include-cleaned`/`--all` for audit history |
1053
+ | Reviewer keeps returning PARTIAL on functional contracts already met | Reviewer demanding tool-event evidence — typically obsoleted after the gate relaxation, but if it persists check the executor's `gitnexus_detect_changes` ran and use the rebuttal pattern (see Specialist Rebuttal As Routine) | Rebut with cited evidence; second FAIL = escalate |
1054
+ | Multiple `sp run` background launches drop silently under shell parallelism | Known launch-ceremony race | Re-check `sp ps` after each dispatch and retry the missing one; serialize when reliability matters |
1055
+ | `sp run` returns `Warning: job started but ID not yet available` and nothing appears in `sp ps --bead <id>` after 30s | Dispatch was refused by epic guard or base-staleness check; stderr now surfaces the refusal reason (see `sp run --background` post-fix) | Read the surfaced reason; retry with `--force-stale-base` if intentional, or fix the bead/lineage |
1056
+ | `sp feed <job-id>` returns short tail with no tool events | Confirms DB-backed replay is active; if you see ≤10 lines on a real run, the DB is missing events for that job — verify with raw SQL on observability.db | If DB truly lacks events: re-run job; if DB has events but feed truncates: file bug bead — should not happen on current build |
1057
+ | bd "database not found" or per-project Dolt server respawn | bd has spawned a per-project Dolt instead of routing to the shared server | `ps aux \| grep "<repo>/.beads/dolt" \| awk '{print $2}' \| xargs -r kill -9`; ensure `.beads/config.yaml` contains `dolt.shared-server: true`; `bd ready` should now route to `~/.beads/shared-server/` |
1058
+ | Dolt journal corruption (`possible data loss detected at offset N`) | bd-internal | Operator-only — do NOT auto-recover. Stop bd writes, snapshot `~/.beads/shared-server/dolt`, run `dolt fsck` (read-only) first. Operator decides on `--revive-journal-with-data-loss` after reviewing the warning |
1059
+
555
1060
  ## What Orchestrator Does Differently Because Of This Skill
556
1061
 
557
1062
  - Writes bead contract before dispatch.
@@ -559,4 +1064,11 @@ Then choose one action:
559
1064
  - Uses specialist role by job shape, not by habit.
560
1065
  - Keeps fix loops alive with resume, not re-spawn.
561
1066
  - Treats reviewer PASS as only publish gate.
562
- - Keeps memory-processor and xt-merge out of this skill on purpose.
1067
+ - Maps file-overlap surface BEFORE dispatching parallel waves.
1068
+ - Files one READ_ONLY test-failure-map bead before fix chains when ≥5 failures span subsystems.
1069
+ - Uses overthinker and reviewer as conversation, not one-shot oracles — rebuts with cited evidence once, then escalates.
1070
+ - Smokes every npm script and entry point before declaring integration done; runs cross-cutting security-auditor on cumulative diff when sensitive surfaces were touched.
1071
+ - Commits debugger-restitch results via FF or cherry-pick of the named commit, not the checkpoint commit above it.
1072
+ - Closes finished chain's bead BEFORE committing that worktree when other chains still in_progress (project-wide commit-gate).
1073
+ - Runs `/session-close-report` at session end and only then declares done.
1074
+ - Keeps memory-processor, xt-merge, session-close-report, and releasing out of this skill on purpose — each has its own.