xtrm-tools 0.7.11 → 0.7.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (26) hide show
  1. package/.xtrm/hooks/specialists/specialists-memory-cache-sync.mjs +57 -0
  2. package/.xtrm/registry.json +477 -389
  3. package/.xtrm/skills/default/premortem/SKILL.md +218 -0
  4. package/.xtrm/skills/default/releasing/SKILL.md +90 -0
  5. package/.xtrm/skills/default/sync-docs/SKILL.md +88 -208
  6. package/.xtrm/skills/default/sync-docs/scripts/pre-context.sh +17 -0
  7. package/.xtrm/skills/default/update-specialists/SKILL.md +448 -0
  8. package/.xtrm/skills/default/update-xt/SKILL.md +34 -0
  9. package/.xtrm/skills/default/using-kpi/SKILL.md +150 -0
  10. package/.xtrm/skills/default/using-specialists-v2/SKILL.md +683 -0
  11. package/cli/dist/index.cjs +839 -429
  12. package/cli/dist/index.cjs.map +1 -1
  13. package/cli/package.json +1 -1
  14. package/package.json +2 -2
  15. package/packages/pi-extensions/.serena/project.yml +119 -0
  16. package/packages/pi-extensions/extensions/pi-serena-compact/index.ts +4 -12
  17. package/packages/pi-extensions/extensions/xtrm-loader/index.ts +0 -1
  18. package/packages/pi-extensions/extensions/xtrm-ui/index.ts +201 -36
  19. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-dark-flattools.json +79 -0
  20. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-dark.json +85 -0
  21. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-light-flattools.json +79 -0
  22. package/packages/pi-extensions/extensions/xtrm-ui/themes/pidex-light.json +85 -0
  23. package/packages/pi-extensions/package.json +1 -1
  24. package/packages/pi-extensions/themes/xtrm-ui/pidex-dark-flattools.json +79 -0
  25. package/packages/pi-extensions/themes/xtrm-ui/pidex-dark.json +3 -3
  26. package/packages/pi-extensions/themes/xtrm-ui/pidex-light-flattools.json +79 -0
@@ -0,0 +1,218 @@
1
+ ---
2
+ name: premortem
3
+ description: "Run a premortem on any plan, launch, product, hire, strategy, or decision. Assumes it already failed 6 months from now and works backward to find every reason why. Produces a revised plan with blind spots exposed. MANDATORY TRIGGERS: 'premortem this', 'premortem my', 'run a premortem', 'what could kill this', 'future-proof this', 'stress test this plan', 'what am i missing here', 'find the blind spots'. STRONG TRIGGERS: 'what could go wrong', 'am i missing anything', 'poke holes in this', 'where will this break', 'devil's advocate this'. Do NOT trigger on simple feedback requests, factual questions, or LLM Council requests. DO trigger when someone has a plan or commitment where the cost of being wrong is high."
4
+ ---
5
+
6
+ # Premortem
7
+
8
+ A premortem is the opposite of a postmortem. Instead of figuring out what went wrong after something fails, you imagine it already failed and figure out why before you start.
9
+
10
+ The method comes from psychologist Gary Klein. He published it in Harvard Business Review. Daniel Kahneman (the Nobel Prize-winning psychologist behind "Thinking, Fast and Slow") called it his single most valuable decision-making technique. Google, Goldman Sachs, and Procter & Gamble all use it before major decisions.
11
+
12
+ The core insight: when you ask people "what could go wrong?" they give you cautious, hedged answers. When you say "this already failed, tell me why," their brains switch into narrative mode and generate way more specific, creative, honest reasons. Researchers at Wharton and Cornell called this "prospective hindsight" and found it significantly increases the ability to identify causes of future outcomes.
13
+
14
+ The reason this matters for AI-assisted decisions: Claude defaults to agreeable, optimistic responses. If you ask "is this a good plan?" it will find reasons to say yes. The premortem breaks this pattern by forcing the frame into "this is dead, explain how it died." Claude stops looking for reasons your plan will work and starts explaining how it fell apart.
15
+
16
+ ---
17
+
18
+ ## when to run a premortem
19
+
20
+ Good premortem targets:
21
+ - A product or feature you're about to build
22
+ - A launch plan with money or reputation on the line
23
+ - A pricing change or business model shift
24
+ - A hire you're about to make
25
+ - A strategy or positioning pivot
26
+ - A partnership or deal you're evaluating
27
+ - Any commitment where the cost of being wrong is high
28
+
29
+ Bad premortem targets:
30
+ - Vague ideas with no concrete plan yet (help them plan first, then premortem)
31
+ - Questions with one right answer (just answer them)
32
+ - Requests for creative feedback on a draft (that's editing, not a premortem)
33
+ - Decisions that are already made and irreversible (a premortem is only useful when you can still change course)
34
+
35
+ ---
36
+
37
+ ## context gathering (the minimum bar)
38
+
39
+ A premortem is only as good as the context it runs on. Vague input produces vague failure scenarios that help nobody. Before running the premortem, you need to hit a minimum context threshold.
40
+
41
+ ### step 1: scan for existing context
42
+
43
+ Before asking the user anything, look for context that's already available:
44
+
45
+ **A. The current conversation.** The user may have been discussing a plan, a launch, a product, or a decision earlier in this session. Read back through the conversation and extract whatever's relevant.
46
+
47
+ **B. The workspace.** Quickly scan for files that might contain relevant context:
48
+ - `CLAUDE.md` or `claude.md` (business context, preferences, constraints)
49
+ - Any `memory/` folder (audience profiles, business details, past decisions)
50
+ - Files the user explicitly referenced or attached
51
+ - Any project files, briefs, or plans that relate to the thing being premortemed
52
+
53
+ Use `Glob` and quick `Read` calls. Don't spend more than 30 seconds on this. You're looking for the key files that would ground the failure scenarios in reality.
54
+
55
+ ### step 2: evaluate context sufficiency
56
+
57
+ After scanning, check whether you have enough to run a useful premortem. You need three things:
58
+
59
+ 1. **What is it?** — A clear understanding of the thing being premortemed (a product, a launch, a hire, a pricing change, a strategy). You need to be able to describe it back to the user in one sentence.
60
+
61
+ 2. **Who is it for / who does it affect?** — The audience, the customer, the team, the stakeholders. Failure scenarios depend heavily on who's involved.
62
+
63
+ 3. **What does success look like?** — What outcome is the user hoping for? Failure is defined by inverting success. If you don't know what success means, you can't define what failure means.
64
+
65
+ ### step 3: fill gaps conversationally
66
+
67
+ If you have all three, proceed immediately to the premortem. Don't ask unnecessary questions.
68
+
69
+ If you're missing one or more, ask for the most important missing piece first. One question at a time. Evaluate after each answer whether you now have enough. Keep asking until the threshold is met, but never ask more than you need.
70
+
71
+ Examples of focused context questions:
72
+ - "What specifically are you about to launch/build/decide?" (if you don't know what it is)
73
+ - "Who is this for?" (if you know the plan but not the audience)
74
+ - "What does a win look like for this?" (if you know the plan and audience but not the success criteria)
75
+
76
+ The goal is to reach the minimum bar as fast as possible without making the user feel like they're filling out a form. Conversational, not interrogative. If you can infer an answer from context, do that instead of asking.
77
+
78
+ ---
79
+
80
+ ## how a premortem session works
81
+
82
+ ### step 1: set the frame
83
+
84
+ After gathering sufficient context, set the premortem frame explicitly. Something like:
85
+
86
+ "OK, I have enough context. Let's run the premortem. Here's the premise: it's 6 months from now. [The plan/launch/decision] has failed. It's done. We're looking back and trying to understand what went wrong."
87
+
88
+ This framing matters. It shifts the mode from "evaluate this plan" (which triggers agreeable responses) to "explain why this died" (which triggers honest, specific failure identification).
89
+
90
+ ### step 2: generate failure reasons (raw premortem)
91
+
92
+ Run the raw premortem as a single comprehensive analysis. No prescribed categories, no lenses, no constraints. Just the core Klein method:
93
+
94
+ "This plan has failed 6 months from now. Generate every genuine reason it could have died. Be comprehensive. Be specific. Ground every reason in the actual details of the plan. Don't pad with weak reasons and don't stop early if there are more."
95
+
96
+ The output should be a comprehensive list of failure reasons, each stated in 1-2 sentences. Be honest and thorough. Some plans might have 4 genuine failure modes. Others might have 9. The number should be whatever is real for this specific plan.
97
+
98
+ Each failure reason should be:
99
+ - Specific to this plan (not generic advice that applies to anything)
100
+ - Grounded in actual details the user provided
101
+ - A genuine threat (not a minor inconvenience or an extremely unlikely edge case)
102
+
103
+ ### step 3: deep-dive agents (one per failure reason, all in parallel)
104
+
105
+ Take every failure reason from step 2 and spawn one sub-agent per reason, all in parallel. Each agent takes its assigned failure reason and goes deep on it independently.
106
+
107
+ **Sub-agent prompt template:**
108
+
109
+ ```
110
+ You are an investigator in a premortem analysis. You've been assigned one specific failure reason to analyze in depth.
111
+
112
+ The plan:
113
+ ---
114
+ [full context: what it is, who it's for, what success looks like, plus relevant workspace context]
115
+ ---
116
+
117
+ PREMORTEM FRAME: It is 6 months from now. This plan has failed.
118
+
119
+ YOUR ASSIGNED FAILURE REASON: [the specific failure reason from step 2]
120
+
121
+ Your job is to go deep on this one failure. Write the story of how it actually played out. Be specific. Use details from the plan. Make it feel real, like a case study of something that actually happened.
122
+
123
+ Your output should include:
124
+
125
+ 1. THE FAILURE STORY: A 2-3 paragraph narrative of how this specific failure played out. Use details from the plan. Name specific moments where things went wrong and why.
126
+
127
+ 2. THE UNDERLYING ASSUMPTION: The one thing the user was taking for granted that made this failure possible. State it in one sentence.
128
+
129
+ 3. EARLY WARNING SIGNS: 1-2 concrete, observable signals the user could watch for that would indicate this failure mode is starting to play out. These should be things you can actually see or measure, not vague feelings.
130
+
131
+ Keep the total response under 300 words. Be direct. Don't hedge. Don't sugarcoat.
132
+ ```
133
+
134
+ ### step 4: synthesis
135
+
136
+ After all agents complete, read every deep-dive and produce the synthesis:
137
+
138
+ **PREMORTEM REPORT**
139
+
140
+ 1. **The Most Likely Failure** — Which failure scenario is most probable given what you know about the plan? Why? This is the one the user should focus on first.
141
+
142
+ 2. **The Most Dangerous Failure** — Which failure scenario would cause the most damage if it happened, even if it's less likely? This is the one worth insuring against.
143
+
144
+ 3. **The Hidden Assumption** — Across all the failure analyses, what's the single biggest assumption the user is making that they probably haven't questioned? This is often where the real value of the premortem lives: the thing that's so obvious to the user that they forgot it was an assumption.
145
+
146
+ 4. **The Revised Plan** — Based on the failure scenarios, what specific changes would make the plan more resilient? Be concrete. Don't say "consider your pricing." Say "test pricing at $X with 20 people before committing to it publicly." Each revision should map directly to a specific failure scenario.
147
+
148
+ 5. **The Pre-Launch Checklist** — 3-5 specific things the user should verify, test, or put in place before executing. Each one should prevent or detect one of the failure modes identified.
149
+
150
+ ### step 5: generate the premortem report
151
+
152
+ Generate a visual HTML report and save it to the user's workspace.
153
+
154
+ **File:** `premortem-report-[timestamp].html`
155
+
156
+ The report should be a single self-contained HTML file with inline CSS. Design principles:
157
+ - Dark background (#0a0e1a or similar), clean typography, easy to scan
158
+ - The synthesis section (most likely failure, most dangerous failure, hidden assumption, revised plan, checklist) should be prominently displayed at the top since that's what most people will read first
159
+ - One visual card per failure reason showing the deep-dive analysis. Each card should display the failure reason as a header, the failure story, the underlying assumption, and the early warning signs. Use distinct accent colors for each card so they're visually scannable.
160
+ - A clear visual indicator of severity/likelihood for each failure mode
161
+ - The round-robin visual: show the number of agents that ran and their findings as a grid or card layout, so the user can see the full scope of the premortem at a glance
162
+ - Footer with timestamp and what was premortemed
163
+
164
+ Open the HTML file after generating it.
165
+
166
+ ### step 6: save the transcript
167
+
168
+ Save the full premortem transcript as `premortem-transcript-[timestamp].md` in the same location. This includes:
169
+ - The context that was gathered (what, who, success criteria)
170
+ - The raw premortem failure reasons
171
+ - All agent deep-dives
172
+ - The full synthesis
173
+
174
+ ---
175
+
176
+ ## output format
177
+
178
+ Every premortem session produces two files:
179
+
180
+ ```
181
+ premortem-report-[timestamp].html # visual report for scanning
182
+ premortem-transcript-[timestamp].md # full transcript for reference
183
+ ```
184
+
185
+ The user sees the HTML report first. The transcript is there if they want to dig deeper into the reasoning behind each failure scenario.
186
+
187
+ Also provide a concise summary in the chat: the most likely failure, the hidden assumption, and the single most important revision to the plan. Three sentences max. The report has the full details.
188
+
189
+ ---
190
+
191
+ ## example: premortming a product launch
192
+
193
+ **User:** "premortem this: I'm about to launch a $297 live workshop on how to use Claude Cowork for marketing teams. 50 seats. Targeting marketing managers at companies with 10-50 employees."
194
+
195
+ **Raw premortem identifies 6 failure reasons:**
196
+ 1. Marketing managers at this company size need approval to spend $297 on professional development, adding friction you haven't accounted for
197
+ 2. "Claude Cowork for marketing" is a tool-specific pitch in a market where most managers are still figuring out whether AI is relevant to them at all
198
+ 3. The audience that actually buys might be solopreneurs, not team managers, creating a mismatch between content and attendees
199
+ 4. Building a workshop for marketing teams requires demo environments with realistic marketing data and multi-seat setups, which takes 5 weeks of prep, not the 2 you budgeted
200
+ 5. If 60% of attendees are solopreneurs, your reviews and case studies won't resonate with the marketing manager audience you need for future cohorts
201
+ 6. At $297 with 50 seats, the max revenue is $14,850, which may not justify the prep time against other revenue opportunities
202
+
203
+ **6 agents go deep on each reason independently, producing failure stories, underlying assumptions, and early warning signs.**
204
+
205
+ **Synthesis:** Most likely failure is the audience mismatch: you're targeting people who need approval to spend $297, which adds friction you haven't accounted for. Most dangerous failure: attracting solopreneurs instead of team managers means your case studies and testimonials won't resonate with the actual target buyer for future cohorts, compounding the problem over time. Hidden assumption: you're assuming "marketing managers at 10-50 person companies" is a reachable audience, but these people don't self-identify that way and don't hang out in the same places. Revised plan: run a $47 pilot session for 20 people first. Use that to identify whether your actual buyers are team managers or solopreneurs, and build the full workshop for whoever actually shows up.
206
+
207
+ ---
208
+
209
+ ## important notes
210
+
211
+ - **Always spawn all failure agents in parallel.** Sequential spawning wastes time and lets earlier responses influence later ones.
212
+ - **Always set the premortem frame explicitly.** "This has already failed" is the psychological mechanism that makes this work. Without it, the analysis defaults to polite risk assessment instead of honest failure identification.
213
+ - **Be comprehensive but not padded.** Find every genuine failure reason. Don't stop at 3 if there are 7. But don't force 7 if there are only 3. The number should be whatever is real for this specific plan.
214
+ - **The synthesis is the product.** Most users will read the synthesis and skim the individual failure cards. Make the synthesis specific and actionable.
215
+ - **Don't sugarcoat.** The whole point of a premortem is to tell the user things they don't want to hear before reality does. If a plan has serious problems, say so directly.
216
+ - **The revised plan must be concrete.** Don't say "consider testing your pricing." Say "run a $47 pilot with 20 people before committing to the full $297 workshop." Every revision should be something the user can actually do this week.
217
+ - **Respect the minimum context threshold.** Running a premortem on insufficient context produces generic failures that waste the user's time. It's better to ask one more question than to produce a bad premortem.
218
+ - **This is not the LLM Council.** The council gives multiple perspectives on a decision right now. The premortem sends Claude into the future where the decision already failed and works backward to explain why. Different psychological mechanism, different output. If the user seems to want multiple perspectives rather than failure analysis, suggest the council instead.
@@ -0,0 +1,90 @@
1
+ ---
2
+ name: releasing
3
+ description: >-
4
+ Cut a release end-to-end via the changelog-keeper specialist. Use when the
5
+ operator wants to publish a new tag (vX.Y.Z) — drafts CHANGELOG section
6
+ from xt reports, bumps package.json, rebuilds dist, commits, tags, pushes,
7
+ optional GH release. Strict scope: only CHANGELOG.md + package.json + dist/.
8
+ version: 1.0.0
9
+ ---
10
+
11
+ # releasing
12
+
13
+ One-step release publication via specialist delegation.
14
+
15
+ ## When to use
16
+
17
+ The operator wants to cut a release. They say "release it", "ship vX.Y.Z", "cut a tag", or just "release".
18
+
19
+ ## How
20
+
21
+ 1. Determine the target version. Default is patch bump from the most recent semver tag. Operator may specify `--minor`, `--major`, or an explicit version.
22
+
23
+ 2. Determine the tag range. Default is `<latest-tag>..HEAD`. For backfills, operator names `--from` / `--to` explicitly.
24
+
25
+ 3. Create a release bead. Template:
26
+
27
+ ```
28
+ PROBLEM: Cut release vX.Y.Z covering <prev-tag>..HEAD.
29
+ SUCCESS: CHANGELOG.md updated with new section above prior release; package.json bumped; dist rebuilt; commit `release: vX.Y.Z` pushed with tag.
30
+ SCOPE: CHANGELOG.md, package.json, dist/. Synthesis input: xt reports under .xtrm/reports/ dated within <prev-tag-date>..HEAD.
31
+ NON_GOALS: No source/docs/config edits. No retroactive changes to prior release sections.
32
+ CONSTRAINTS: Keep-a-Changelog v1.0.0 format. One-line bullets. Default bucket Changed. Deprecated only for explicit sunsets.
33
+ VALIDATION: git diff --stat HEAD~1 HEAD shows only CHANGELOG.md, package.json, dist/.
34
+ OUTPUT: Final report with VERSION, COMMIT, TAG, PUSHED status.
35
+ GH_RELEASE: <true|false> # whether to also `gh release create`
36
+ ```
37
+
38
+ 4. Dispatch the specialist:
39
+
40
+ ```bash
41
+ sp run changelog-keeper --bead <bead-id> --background
42
+ ```
43
+
44
+ No worktree (release work is on the active branch). No reviewer chain — the verification is the diff check below.
45
+
46
+ 5. **Verify the diff after the specialist completes.** This is the critical operator gate.
47
+
48
+ ```bash
49
+ git diff --stat HEAD~1 HEAD
50
+ ```
51
+
52
+ The output MUST show ONLY:
53
+ - `CHANGELOG.md`
54
+ - `package.json`
55
+ - `dist/index.js`, `dist/lib.js`, `dist/types/**`
56
+
57
+ If ANY other file appears (`src/**`, `docs/**` other than CHANGELOG, `config/**`, `tests/**`, `README.md`, etc.), the specialist violated scope. Action:
58
+
59
+ ```bash
60
+ git push --delete origin vX.Y.Z # delete remote tag
61
+ git tag -d vX.Y.Z # delete local tag
62
+ git reset --hard HEAD~1 # discard the release commit
63
+ git push --force-with-lease # only if push already happened
64
+ ```
65
+
66
+ Then file a bug bead naming the offending paths and revisit the specialist's mandatory rule.
67
+
68
+ 6. If the diff check passes, the release is shipped. Confirm:
69
+
70
+ ```bash
71
+ git tag --list 'v*' | tail -3 # new tag present
72
+ git log --oneline -1 # message starts with "release: vX.Y.Z"
73
+ ```
74
+
75
+ ## Why this design
76
+
77
+ - Specialist does the work itself (Read xt reports, Edit files, Bash for build/commit/tag/push). No CLI plumbing, no template substitution, no JSON output schema, no two-phase prepare/publish gate.
78
+ - Mandatory rule `changelog-keeper-scope` enforces the edit whitelist at the specialist level.
79
+ - Operator gate is the single `git diff --stat HEAD~1 HEAD` check after the specialist finishes. If it shows only whitelisted paths, the release is correct.
80
+ - xt reports are the synthesis input, not git log + bd query. Reports are pre-curated, signal-rich, written in user-facing language.
81
+
82
+ ## Parallel sessions
83
+
84
+ Each orchestrator runs this skill in its own session. The specialist commits + tags + pushes atomically. If two sessions try to release the same version, whichever pushes first wins; the other sees a remote tag conflict on push and aborts with a clean error. Operator picks the next version and retries.
85
+
86
+ ## Don't
87
+
88
+ - Don't manually `sp release prepare`/`publish` — those CLIs are removed in v3.X.Y (TBD).
89
+ - Don't edit CHANGELOG.md outside the specialist run — manual edits leak into the next release's diff and break scope verification.
90
+ - Don't pre-stage files. The specialist stages exactly what it commits.