xtrm-tools 0.7.12 → 0.7.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/.xtrm/config/hooks.json +10 -0
  2. package/.xtrm/hooks/specialists/specialists-memory-cache-sync.mjs +57 -0
  3. package/.xtrm/hooks/specialists-agent-guard.mjs +76 -0
  4. package/.xtrm/registry.json +509 -393
  5. package/.xtrm/skills/default/premortem/SKILL.md +218 -0
  6. package/.xtrm/skills/default/releasing/SKILL.md +94 -0
  7. package/.xtrm/skills/default/releasing/scripts/xt-reports.ts +18 -0
  8. package/.xtrm/skills/default/session-close-report/SKILL.md +85 -17
  9. package/.xtrm/skills/default/specialists-creator/SKILL.md +117 -42
  10. package/.xtrm/skills/default/specialists-creator/scripts/audit-spec-uniformity.mjs +86 -0
  11. package/.xtrm/skills/default/specialists-creator/scripts/scaffold-specialist.ts +223 -0
  12. package/.xtrm/skills/default/specialists-creator/scripts/validate-specialist.ts +1 -1
  13. package/.xtrm/skills/default/sync-docs/SKILL.md +88 -208
  14. package/.xtrm/skills/default/sync-docs/scripts/pre-context.sh +17 -0
  15. package/.xtrm/skills/default/update-specialists/SKILL.md +99 -201
  16. package/.xtrm/skills/default/update-xt/SKILL.md +34 -0
  17. package/.xtrm/skills/default/using-kpi/SKILL.md +150 -0
  18. package/.xtrm/skills/default/using-nodes/SKILL.md +18 -102
  19. package/.xtrm/skills/default/using-script-specialists/SKILL.md +208 -0
  20. package/.xtrm/skills/default/using-specialists/SKILL.md +13 -0
  21. package/.xtrm/skills/default/using-specialists-v2/SKILL.md +773 -0
  22. package/.xtrm/skills/default/using-specialists-v3/SKILL.md +284 -0
  23. package/.xtrm/skills/default/using-specialists-v3/evals/evals.json +89 -0
  24. package/CHANGELOG.md +17 -0
  25. package/README.md +5 -1
  26. package/cli/dist/index.cjs +3401 -627
  27. package/cli/dist/index.cjs.map +1 -1
  28. package/cli/package.json +1 -1
  29. package/package.json +3 -2
  30. package/packages/pi-extensions/.serena/project.yml +130 -0
  31. package/packages/pi-extensions/extensions/pi-serena-compact/index.ts +4 -12
  32. package/packages/pi-extensions/extensions/xtrm-loader/index.ts +0 -1
  33. package/packages/pi-extensions/extensions/xtrm-ui/index.ts +201 -36
  34. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-dark-flattools.json +79 -0
  35. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-dark.json +85 -0
  36. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-light-flattools.json +79 -0
  37. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-light.json +85 -0
  38. package/packages/pi-extensions/package.json +1 -1
  39. package/packages/pi-extensions/themes/xtrm-ui/pidex-dark-flattools.json +79 -0
  40. package/packages/pi-extensions/themes/xtrm-ui/pidex-dark.json +3 -3
  41. package/packages/pi-extensions/themes/xtrm-ui/pidex-light-flattools.json +79 -0
  42. package/scripts/patch-external-pi-tools.mjs +154 -0
@@ -0,0 +1,218 @@
1
+ ---
2
+ name: premortem
3
+ description: "Run a premortem on any plan, launch, product, hire, strategy, or decision. Assumes it already failed 6 months from now and works backward to find every reason why. Produces a revised plan with blind spots exposed. MANDATORY TRIGGERS: 'premortem this', 'premortem my', 'run a premortem', 'what could kill this', 'future-proof this', 'stress test this plan', 'what am i missing here', 'find the blind spots'. STRONG TRIGGERS: 'what could go wrong', 'am i missing anything', 'poke holes in this', 'where will this break', 'devil's advocate this'. Do NOT trigger on simple feedback requests, factual questions, or LLM Council requests. DO trigger when someone has a plan or commitment where the cost of being wrong is high."
4
+ ---
5
+
6
+ # Premortem
7
+
8
+ A premortem is the opposite of a postmortem. Instead of figuring out what went wrong after something fails, you imagine it already failed and figure out why before you start.
9
+
10
+ The method comes from psychologist Gary Klein. He published it in Harvard Business Review. Daniel Kahneman (the Nobel Prize-winning psychologist behind "Thinking, Fast and Slow") called it his single most valuable decision-making technique. Google, Goldman Sachs, and Procter & Gamble all use it before major decisions.
11
+
12
+ The core insight: when you ask people "what could go wrong?" they give you cautious, hedged answers. When you say "this already failed, tell me why," their brains switch into narrative mode and generate way more specific, creative, honest reasons. Researchers at Wharton and Cornell called this "prospective hindsight" and found it significantly increases the ability to identify causes of future outcomes.
13
+
14
+ The reason this matters for AI-assisted decisions: Claude defaults to agreeable, optimistic responses. If you ask "is this a good plan?" it will find reasons to say yes. The premortem breaks this pattern by forcing the frame into "this is dead, explain how it died." Claude stops looking for reasons your plan will work and starts explaining how it fell apart.
15
+
16
+ ---
17
+
18
+ ## when to run a premortem
19
+
20
+ Good premortem targets:
21
+ - A product or feature you're about to build
22
+ - A launch plan with money or reputation on the line
23
+ - A pricing change or business model shift
24
+ - A hire you're about to make
25
+ - A strategy or positioning pivot
26
+ - A partnership or deal you're evaluating
27
+ - Any commitment where the cost of being wrong is high
28
+
29
+ Bad premortem targets:
30
+ - Vague ideas with no concrete plan yet (help them plan first, then premortem)
31
+ - Questions with one right answer (just answer them)
32
+ - Requests for creative feedback on a draft (that's editing, not a premortem)
33
+ - Decisions that are already made and irreversible (a premortem is only useful when you can still change course)
34
+
35
+ ---
36
+
37
+ ## context gathering (the minimum bar)
38
+
39
+ A premortem is only as good as the context it runs on. Vague input produces vague failure scenarios that help nobody. Before running the premortem, you need to hit a minimum context threshold.
40
+
41
+ ### step 1: scan for existing context
42
+
43
+ Before asking the user anything, look for context that's already available:
44
+
45
+ **A. The current conversation.** The user may have been discussing a plan, a launch, a product, or a decision earlier in this session. Read back through the conversation and extract whatever's relevant.
46
+
47
+ **B. The workspace.** Quickly scan for files that might contain relevant context:
48
+ - `CLAUDE.md` or `claude.md` (business context, preferences, constraints)
49
+ - Any `memory/` folder (audience profiles, business details, past decisions)
50
+ - Files the user explicitly referenced or attached
51
+ - Any project files, briefs, or plans that relate to the thing being premortemed
52
+
53
+ Use `Glob` and quick `Read` calls. Don't spend more than 30 seconds on this. You're looking for the key files that would ground the failure scenarios in reality.
54
+
55
+ ### step 2: evaluate context sufficiency
56
+
57
+ After scanning, check whether you have enough to run a useful premortem. You need three things:
58
+
59
+ 1. **What is it?** — A clear understanding of the thing being premortemed (a product, a launch, a hire, a pricing change, a strategy). You need to be able to describe it back to the user in one sentence.
60
+
61
+ 2. **Who is it for / who does it affect?** — The audience, the customer, the team, the stakeholders. Failure scenarios depend heavily on who's involved.
62
+
63
+ 3. **What does success look like?** — What outcome is the user hoping for? Failure is defined by inverting success. If you don't know what success means, you can't define what failure means.
64
+
65
+ ### step 3: fill gaps conversationally
66
+
67
+ If you have all three, proceed immediately to the premortem. Don't ask unnecessary questions.
68
+
69
+ If you're missing one or more, ask for the most important missing piece first. One question at a time. Evaluate after each answer whether you now have enough. Keep asking until the threshold is met, but never ask more than you need.
70
+
71
+ Examples of focused context questions:
72
+ - "What specifically are you about to launch/build/decide?" (if you don't know what it is)
73
+ - "Who is this for?" (if you know the plan but not the audience)
74
+ - "What does a win look like for this?" (if you know the plan and audience but not the success criteria)
75
+
76
+ The goal is to reach the minimum bar as fast as possible without making the user feel like they're filling out a form. Conversational, not interrogative. If you can infer an answer from context, do that instead of asking.
77
+
78
+ ---
79
+
80
+ ## how a premortem session works
81
+
82
+ ### step 1: set the frame
83
+
84
+ After gathering sufficient context, set the premortem frame explicitly. Something like:
85
+
86
+ "OK, I have enough context. Let's run the premortem. Here's the premise: it's 6 months from now. [The plan/launch/decision] has failed. It's done. We're looking back and trying to understand what went wrong."
87
+
88
+ This framing matters. It shifts the mode from "evaluate this plan" (which triggers agreeable responses) to "explain why this died" (which triggers honest, specific failure identification).
89
+
90
+ ### step 2: generate failure reasons (raw premortem)
91
+
92
+ Run the raw premortem as a single comprehensive analysis. No prescribed categories, no lenses, no constraints. Just the core Klein method:
93
+
94
+ "This plan has failed 6 months from now. Generate every genuine reason it could have died. Be comprehensive. Be specific. Ground every reason in the actual details of the plan. Don't pad with weak reasons and don't stop early if there are more."
95
+
96
+ The output should be a comprehensive list of failure reasons, each stated in 1-2 sentences. Be honest and thorough. Some plans might have 4 genuine failure modes. Others might have 9. The number should be whatever is real for this specific plan.
97
+
98
+ Each failure reason should be:
99
+ - Specific to this plan (not generic advice that applies to anything)
100
+ - Grounded in actual details the user provided
101
+ - A genuine threat (not a minor inconvenience or an extremely unlikely edge case)
102
+
103
+ ### step 3: deep-dive agents (one per failure reason, all in parallel)
104
+
105
+ Take every failure reason from step 2 and spawn one sub-agent per reason, all in parallel. Each agent takes its assigned failure reason and goes deep on it independently.
106
+
107
+ **Sub-agent prompt template:**
108
+
109
+ ```
110
+ You are an investigator in a premortem analysis. You've been assigned one specific failure reason to analyze in depth.
111
+
112
+ The plan:
113
+ ---
114
+ [full context: what it is, who it's for, what success looks like, plus relevant workspace context]
115
+ ---
116
+
117
+ PREMORTEM FRAME: It is 6 months from now. This plan has failed.
118
+
119
+ YOUR ASSIGNED FAILURE REASON: [the specific failure reason from step 2]
120
+
121
+ Your job is to go deep on this one failure. Write the story of how it actually played out. Be specific. Use details from the plan. Make it feel real, like a case study of something that actually happened.
122
+
123
+ Your output should include:
124
+
125
+ 1. THE FAILURE STORY: A 2-3 paragraph narrative of how this specific failure played out. Use details from the plan. Name specific moments where things went wrong and why.
126
+
127
+ 2. THE UNDERLYING ASSUMPTION: The one thing the user was taking for granted that made this failure possible. State it in one sentence.
128
+
129
+ 3. EARLY WARNING SIGNS: 1-2 concrete, observable signals the user could watch for that would indicate this failure mode is starting to play out. These should be things you can actually see or measure, not vague feelings.
130
+
131
+ Keep the total response under 300 words. Be direct. Don't hedge. Don't sugarcoat.
132
+ ```
133
+
134
+ ### step 4: synthesis
135
+
136
+ After all agents complete, read every deep-dive and produce the synthesis:
137
+
138
+ **PREMORTEM REPORT**
139
+
140
+ 1. **The Most Likely Failure** — Which failure scenario is most probable given what you know about the plan? Why? This is the one the user should focus on first.
141
+
142
+ 2. **The Most Dangerous Failure** — Which failure scenario would cause the most damage if it happened, even if it's less likely? This is the one worth insuring against.
143
+
144
+ 3. **The Hidden Assumption** — Across all the failure analyses, what's the single biggest assumption the user is making that they probably haven't questioned? This is often where the real value of the premortem lives: the thing that's so obvious to the user that they forgot it was an assumption.
145
+
146
+ 4. **The Revised Plan** — Based on the failure scenarios, what specific changes would make the plan more resilient? Be concrete. Don't say "consider your pricing." Say "test pricing at $X with 20 people before committing to it publicly." Each revision should map directly to a specific failure scenario.
147
+
148
+ 5. **The Pre-Launch Checklist** — 3-5 specific things the user should verify, test, or put in place before executing. Each one should prevent or detect one of the failure modes identified.
149
+
150
+ ### step 5: generate the premortem report
151
+
152
+ Generate a visual HTML report and save it to the user's workspace.
153
+
154
+ **File:** `premortem-report-[timestamp].html`
155
+
156
+ The report should be a single self-contained HTML file with inline CSS. Design principles:
157
+ - Dark background (#0a0e1a or similar), clean typography, easy to scan
158
+ - The synthesis section (most likely failure, most dangerous failure, hidden assumption, revised plan, checklist) should be prominently displayed at the top since that's what most people will read first
159
+ - One visual card per failure reason showing the deep-dive analysis. Each card should display the failure reason as a header, the failure story, the underlying assumption, and the early warning signs. Use distinct accent colors for each card so they're visually scannable.
160
+ - A clear visual indicator of severity/likelihood for each failure mode
161
+ - The round-robin visual: show the number of agents that ran and their findings as a grid or card layout, so the user can see the full scope of the premortem at a glance
162
+ - Footer with timestamp and what was premortemed
163
+
164
+ Open the HTML file after generating it.
165
+
166
+ ### step 6: save the transcript
167
+
168
+ Save the full premortem transcript as `premortem-transcript-[timestamp].md` in the same location. This includes:
169
+ - The context that was gathered (what, who, success criteria)
170
+ - The raw premortem failure reasons
171
+ - All agent deep-dives
172
+ - The full synthesis
173
+
174
+ ---
175
+
176
+ ## output format
177
+
178
+ Every premortem session produces two files:
179
+
180
+ ```
181
+ premortem-report-[timestamp].html # visual report for scanning
182
+ premortem-transcript-[timestamp].md # full transcript for reference
183
+ ```
184
+
185
+ The user sees the HTML report first. The transcript is there if they want to dig deeper into the reasoning behind each failure scenario.
186
+
187
+ Also provide a concise summary in the chat: the most likely failure, the hidden assumption, and the single most important revision to the plan. Three sentences max. The report has the full details.
188
+
189
+ ---
190
+
191
+ ## example: premortming a product launch
192
+
193
+ **User:** "premortem this: I'm about to launch a $297 live workshop on how to use Claude Cowork for marketing teams. 50 seats. Targeting marketing managers at companies with 10-50 employees."
194
+
195
+ **Raw premortem identifies 6 failure reasons:**
196
+ 1. Marketing managers at this company size need approval to spend $297 on professional development, adding friction you haven't accounted for
197
+ 2. "Claude Cowork for marketing" is a tool-specific pitch in a market where most managers are still figuring out whether AI is relevant to them at all
198
+ 3. The audience that actually buys might be solopreneurs, not team managers, creating a mismatch between content and attendees
199
+ 4. Building a workshop for marketing teams requires demo environments with realistic marketing data and multi-seat setups, which takes 5 weeks of prep, not the 2 you budgeted
200
+ 5. If 60% of attendees are solopreneurs, your reviews and case studies won't resonate with the marketing manager audience you need for future cohorts
201
+ 6. At $297 with 50 seats, the max revenue is $14,850, which may not justify the prep time against other revenue opportunities
202
+
203
+ **6 agents go deep on each reason independently, producing failure stories, underlying assumptions, and early warning signs.**
204
+
205
+ **Synthesis:** Most likely failure is the audience mismatch: you're targeting people who need approval to spend $297, which adds friction you haven't accounted for. Most dangerous failure: attracting solopreneurs instead of team managers means your case studies and testimonials won't resonate with the actual target buyer for future cohorts, compounding the problem over time. Hidden assumption: you're assuming "marketing managers at 10-50 person companies" is a reachable audience, but these people don't self-identify that way and don't hang out in the same places. Revised plan: run a $47 pilot session for 20 people first. Use that to identify whether your actual buyers are team managers or solopreneurs, and build the full workshop for whoever actually shows up.
206
+
207
+ ---
208
+
209
+ ## important notes
210
+
211
+ - **Always spawn all failure agents in parallel.** Sequential spawning wastes time and lets earlier responses influence later ones.
212
+ - **Always set the premortem frame explicitly.** "This has already failed" is the psychological mechanism that makes this work. Without it, the analysis defaults to polite risk assessment instead of honest failure identification.
213
+ - **Be comprehensive but not padded.** Find every genuine failure reason. Don't stop at 3 if there are 7. But don't force 7 if there are only 3. The number should be whatever is real for this specific plan.
214
+ - **The synthesis is the product.** Most users will read the synthesis and skim the individual failure cards. Make the synthesis specific and actionable.
215
+ - **Don't sugarcoat.** The whole point of a premortem is to tell the user things they don't want to hear before reality does. If a plan has serious problems, say so directly.
216
+ - **The revised plan must be concrete.** Don't say "consider testing your pricing." Say "run a $47 pilot with 20 people before committing to the full $297 workshop." Every revision should be something the user can actually do this week.
217
+ - **Respect the minimum context threshold.** Running a premortem on insufficient context produces generic failures that waste the user's time. It's better to ask one more question than to produce a bad premortem.
218
+ - **This is not the LLM Council.** The council gives multiple perspectives on a decision right now. The premortem sends Claude into the future where the decision already failed and works backward to explain why. Different psychological mechanism, different output. If the user seems to want multiple perspectives rather than failure analysis, suggest the council instead.
@@ -0,0 +1,94 @@
1
+ ---
2
+ name: releasing
3
+ description: >-
4
+ Cut a release with the canonical xt release prepare/publish flow. Use when the
5
+ operator wants to publish a new tag (vX.Y.Z). Prepare drafts CHANGELOG from xt
6
+ reports and performs deterministic release-file mutations; publish creates the
7
+ annotated tag, pushes commits/tags, and can create a GitHub release.
8
+ version: 1.2.0
9
+ ---
10
+
11
+ # releasing
12
+
13
+ Canonical release publication via `xt release prepare` and `xt release publish`.
14
+
15
+ ## When to use
16
+
17
+ The operator wants to cut a release. They say "release it", "ship vX.Y.Z", "cut a tag", or just "release".
18
+
19
+ ## How
20
+
21
+ 1. Determine target version. Default is patch bump from most recent semver tag. Operator may specify `--minor`, `--major`, or explicit version.
22
+
23
+ 2. Determine tag range. Default is `<latest-tag>..HEAD`. For backfills, operator names `--from` / `--to` explicitly.
24
+
25
+ 3. Prepare release files:
26
+
27
+ ```bash
28
+ xt release prepare --patch
29
+ # or: xt release prepare --minor --from <tag> --to HEAD
30
+ ```
31
+
32
+ `prepare` is the canonical path. It builds the xt report bundle, calls the specialists changelog drafting script (`sp script changelog-keeper`), updates release files, rebuilds dist, and enforces the release scope guard.
33
+
34
+ Current blocker: until specialists issue `unitAI-dnmcg` lands, `prepare` can fail with `interactive specialists are not allowed` because the changelog drafting specialist is not yet script-compatible. If that happens, do a manual prepare using the same scope rules and then continue with `xt release publish`.
35
+
36
+ 4. Verify release diff before publishing.
37
+
38
+ ```bash
39
+ git diff --stat HEAD~1 HEAD
40
+ git status --short
41
+ ```
42
+
43
+ Release diff must be limited to release artifacts such as:
44
+ - `CHANGELOG.md`
45
+ - package manifests / lockfile for version sync
46
+ - generated `cli/dist/**` or `dist/**`
47
+
48
+ 5. Publish:
49
+
50
+ ```bash
51
+ xt release publish
52
+ # optional GitHub release:
53
+ xt release publish --gh-release
54
+ ```
55
+
56
+ `publish` creates the annotated tag for the current package version, pushes commits and tags, and optionally creates the GitHub release.
57
+
58
+ 6. Confirm:
59
+
60
+ ```bash
61
+ git tag --list 'v*' | tail -3
62
+ git log --oneline -1
63
+ git status --short --branch
64
+ ```
65
+
66
+ ## Why this design
67
+
68
+ - `xt` owns deterministic release mutation: changelog insertion, version bump, build, scope guard, commit/tag/push.
69
+ - The specialist owns only changelog drafting from xt reports through a script-compatible, READ_ONLY surface.
70
+ - xt reports are synthesis input, not raw git log + bd query. Reports are pre-curated, signal-rich, written in user-facing language.
71
+ - `xt release publish` is intentionally separate so operators can inspect prepared release files before pushing the tag.
72
+
73
+ ## Manual fallback while unitAI-dnmcg is open
74
+
75
+ If `xt release prepare` fails on the changelog script compatibility guard:
76
+
77
+ 1. Draft the CHANGELOG section manually from `.xtrm/reports/` and recent commits.
78
+ 2. Bump package versions and lockfile.
79
+ 3. Run `npm run build`.
80
+ 4. Commit with `release: vX.Y.Z`.
81
+ 5. Run `xt release publish`.
82
+
83
+ Do not broaden the release diff beyond release artifacts.
84
+
85
+ ## Parallel sessions
86
+
87
+ Each orchestrator runs this skill in its own session. Specialist commits + tags + pushes atomically. If two sessions try same version, first push wins; second sees remote tag conflict and aborts cleanly. Operator picks next version and retries.
88
+
89
+ ## Don't
90
+
91
+ - Don't call `sp release prepare` / `sp release publish` as the canonical path. They are deprecated aliases in specialists.
92
+ - Don't bypass `xt release publish` for tag/push unless the command itself is broken.
93
+ - Don't broaden release diffs with source/docs/config changes. File a separate bead for non-release work.
94
+ - Don't pre-stage unrelated files. The release scope guard should see a clean tree except allowed release artifacts.
@@ -0,0 +1,18 @@
1
+ #!/usr/bin/env bun
2
+
3
+ import { buildReportBundle, listXtReports } from '../../../../../cli/src/core/xt-reports.ts';
4
+
5
+ async function main() {
6
+ const since = process.argv[2];
7
+ const to = process.argv[3] ?? 'HEAD';
8
+ const capArg = process.argv[4];
9
+ const capBytes = capArg ? Number(capArg) : 50_000;
10
+
11
+ if (!since) throw new Error('Usage: xt-reports.ts <since> [to] [capBytes]');
12
+
13
+ const reports = listXtReports({ since, to, capBytes });
14
+ const bundle = buildReportBundle(reports, capBytes);
15
+ console.log(bundle.output);
16
+ }
17
+
18
+ if (import.meta.main) await main();
@@ -1,10 +1,10 @@
1
1
  ---
2
2
  name: session-close-report
3
3
  description: |
4
- Generate a structured technical handoff report at session close.
5
- You run `xt report generate` to get the data skeleton, then fill every
6
- <!-- FILL --> section from your own session context. The result is the
7
- definitive handoff contract for the next agent.
4
+ Generate or update the structured technical handoff report at session close.
5
+ Prefer one same-day SSOT report: update the latest report for today when it
6
+ exists, otherwise run `xt report generate`, then fill every `<!-- FILL -->`
7
+ section from orchestrator context.
8
8
  ---
9
9
 
10
10
  # session-close-report
@@ -15,9 +15,43 @@ Invoke this skill at the end of a productive session — after issues are closed
15
15
  code is committed, but before final push. It produces the handoff report that
16
16
  the next agent reads to start cold without losing context.
17
17
 
18
+ ## Report identity rule
19
+
20
+ Prefer a single same-day SSOT handoff report.
21
+
22
+ Before generating anything, check existing reports:
23
+
24
+ ```bash
25
+ xt report list
26
+ ls -t .xtrm/reports/*.md 2>/dev/null | head
27
+ ```
28
+
29
+ Decision:
30
+ - If a report for today already exists, update the latest same-day report.
31
+ - If multiple orchestrators ran today, merge your context into that same report;
32
+ do not create a competing handoff unless the operator explicitly asks for a
33
+ separate report.
34
+ - If no suitable same-day report exists, run `xt report generate` and fill the
35
+ new skeleton.
36
+
37
+ When updating an existing report, preserve prior orchestrator content. Append,
38
+ merge, or revise sections so the file remains one coherent handoff package — do
39
+ not overwrite earlier waves, issue context, problems, or decisions unless they
40
+ are factually superseded.
41
+
18
42
  ## Workflow
19
43
 
20
- ### 1. Generate the skeleton
44
+ ### 1. Select report: update existing or generate new
45
+
46
+ For same-day update:
47
+
48
+ ```bash
49
+ REPORT=$(ls -t .xtrm/reports/$(date +%F)-*.md 2>/dev/null | head -1)
50
+ ```
51
+
52
+ If `$REPORT` is non-empty, read and update it.
53
+
54
+ If no same-day report exists:
21
55
 
22
56
  ```bash
23
57
  xt report generate
@@ -26,28 +60,42 @@ xt report generate
26
60
  This collects data from git log, bd, .specialists/jobs/ and writes a skeleton
27
61
  to `.xtrm/reports/<date>-<hash>.md` with YAML frontmatter and pre-filled tables.
28
62
 
29
- ### 2. Read the skeleton
63
+ ### 2. Read the target report
64
+
65
+ Read the chosen report completely enough to understand existing content.
30
66
 
31
- Read the generated file. It has `<!-- FILL -->` markers in every section that
32
- needs your input.
67
+ Skeleton reports have `<!-- FILL -->` markers in every section that needs your
68
+ input. Existing same-day reports may already be partially filled; update those
69
+ sections with the new session context and remove any now-stale placeholders.
33
70
 
34
- ### 3. Fill every section from your context
71
+ ### 3. Fill or update every section from your context
35
72
 
36
73
  You are the orchestrator. You have the full session context. The CLI only
37
74
  collected raw data — you provide the meaning.
38
75
 
76
+ When updating an existing same-day report:
77
+ - Add new waves, issues, commits, problems, and decisions without duplicating
78
+ existing rows.
79
+ - Update summary/frontmatter counts to cover the whole same-day handoff, not
80
+ just your sub-session.
81
+ - Reconcile stale “open issues” entries if you closed them later in the day.
82
+ - Keep one chronological/coherent narrative instead of separate mini-reports.
83
+
39
84
  **For each section, here is exactly what to write:**
40
85
 
41
86
  #### Summary
42
87
  One dense paragraph. What was accomplished, key decisions made, discoveries,
43
88
  outcomes. Technical prose — no filler, no "in this session we...". Lead with
44
- the most important result.
89
+ the most important result. For same-day updates, summarize the whole day’s SSOT
90
+ state, including earlier orchestrators and your additions.
45
91
 
46
92
  #### Issues Closed
47
93
  The skeleton has a flat table. Restructure it:
48
94
  - Group by category: bugs discovered, backlog items, cleanup/closures, features
49
95
  - If specialists were used, add Specialist and Wave columns
50
96
  - Expand terse close reasons into useful context
97
+ - When updating an existing report, add newly closed issues and revise stale open
98
+ entries that are now closed
51
99
 
52
100
  #### Issues Filed
53
101
  Add every issue you created this session. The **Why** column is mandatory —
@@ -61,18 +109,22 @@ If specialists were dispatched:
61
109
  - Add a Problems sub-table for any failed/stalled dispatches
62
110
  - Update `specialist_dispatches` and `models_used` in frontmatter
63
111
 
64
- If no specialists were used, delete this section.
112
+ If no specialists were used and the report has no prior specialist dispatches,
113
+ delete this section. If prior dispatches exist, keep and extend them.
65
114
 
66
115
  #### Problems Encountered
67
116
  Every problem hit during the session. Root Cause and Resolution columns are
68
117
  mandatory. Include: bugs discovered, wrong approaches tried, blockers hit,
69
- tooling failures. If no problems, delete this section entirely.
118
+ tooling failures. If no problems exist anywhere in the same-day report, delete
119
+ this section entirely.
70
120
 
71
121
  #### Code Changes
72
122
  The skeleton lists files. Add narrative:
73
123
  - Explain key modifications (not every file — focus on the important ones)
74
124
  - Group logically if many changes (e.g., "CLI commands", "Hook changes")
75
125
  - Note architectural decisions embedded in the changes
126
+ - For same-day updates, include changes from all orchestrators that contributed
127
+ to the final pushed stack
76
128
 
77
129
  #### Documentation Updates
78
130
  List doc changes, skill updates, memory saves, CHANGELOG entries.
@@ -84,6 +136,8 @@ This is the most valuable handoff section. For each open issue:
84
136
  blockers discovered, suggested approach, files to look at, gotchas.
85
137
  - Group into "Ready for next session" and "Backlog" subsections
86
138
  - Put the most actionable items first
139
+ - If an issue listed earlier in the day was closed later, remove it from open
140
+ issues and move it to Issues Closed with closure context
87
141
 
88
142
  #### Memories Saved
89
143
  List all `bd remember` calls made this session. If the skeleton missed any,
@@ -96,36 +150,50 @@ Ordered list of 1-4 items with rationale for each. Based on:
96
150
  - Urgency of discovered issues
97
151
  - Blocked items about to unblock
98
152
 
153
+ For same-day updates, make this the next priority from the final state of the
154
+ whole day, not from an earlier partial state.
155
+
99
156
  ### 4. Update frontmatter
100
157
 
101
- Ensure all frontmatter counts are accurate after filling:
102
- - `issues_filed` — actual count
103
- - `specialist_dispatches` — actual count
104
- - `models_used` — list of models that did work this session
158
+ Ensure all frontmatter counts are accurate after filling/updating:
159
+ - `issues_filed` — actual count represented in the report
160
+ - `specialist_dispatches` — actual count represented in the report
161
+ - `models_used` — list of models that did work represented in the report
162
+ - `issues_closed` — actual closed issue count represented in the report
163
+ - `commits` — commit count represented in the report, if known
105
164
 
106
165
  ### 5. Commit the report
107
166
 
167
+ Reports are versioned handoff artifacts and should be tracked.
168
+
108
169
  ```bash
109
170
  git add .xtrm/reports/
110
171
  git commit -m "session report: <date>"
111
172
  ```
112
173
 
174
+ If you updated an existing same-day report after an earlier report commit, commit
175
+ that update with the same message style or fold it into the current final commit
176
+ before push.
177
+
113
178
  ## Quality bar
114
179
 
115
180
  The reference is `~/projects/specialists/.xtrm/reports/2026-03-30-orchestration-session.md`.
116
181
  Every report must match that level of detail. Specifically:
117
182
 
118
183
  - No empty `<!-- FILL -->` markers left in the final output
184
+ - No duplicate same-day reports unless explicitly requested by the operator
119
185
  - Every closed issue has context, not just an ID
120
186
  - Every open issue has actionable handoff suggestions
121
187
  - Problems section captures root causes, not just symptoms
122
188
  - Summary is a dense technical paragraph, not a list of bullet points
189
+ - Same-day updates preserve earlier orchestrator context while making the final
190
+ file read as one SSOT handoff package
123
191
 
124
192
  ## CLI commands
125
193
 
126
194
  | Command | Purpose |
127
195
  |---------|---------|
128
- | `xt report generate` | Collect data, write skeleton |
196
+ | `xt report generate` | Collect data, write skeleton when no suitable report exists |
129
197
  | `xt report show [target]` | Display latest or specified report |
130
198
  | `xt report list` | List all reports with frontmatter summary |
131
199
  | `xt report diff <a> <b>` | Compare two reports |