npm - xtrm-tools - Versions diffs - 0.7.14 → 0.7.16 - Mend

xtrm-tools 0.7.14 → 0.7.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/.xtrm/registry.json +410 -418
package/.xtrm/skills/default/specialists-creator/SKILL.md +16 -0
package/.xtrm/skills/default/using-kpi/SKILL.md +44 -0
package/.xtrm/skills/default/using-xtrm/SKILL.md +14 -0
package/CHANGELOG.md +10 -0
package/cli/dist/index.cjs +8 -0
package/cli/dist/index.cjs.map +1 -1
package/cli/package.json +1 -1
package/package.json +1 -1
package/packages/pi-extensions/package.json +1 -1
package/.xtrm/skills/default/using-specialists-v3/SKILL.md +0 -284
package/.xtrm/skills/default/using-specialists-v3/evals/evals.json +0 -89

package/.xtrm/skills/default/specialists-creator/SKILL.md CHANGED Viewed

@@ -13,6 +13,22 @@ synced_at: 236ca5e6
 > Source of truth: `src/specialist/schema.ts` | Runtime: `src/specialist/runner.ts`
+## Canonical References
+When a custom specialist needs a standard rule or skill, reference the canonical asset by name instead of copying its file into the repo. Runtime/package fallback resolves canonical mandatory rules and skills when no project-local override exists.
+Example:
+```json
+{
+  "mandatory_rules": { "template_sets": ["serena-cheatsheet"] },
+  "skills": { "paths": ["releasing"] }
+}
+```
+Only create project-local copies when intentionally changing canonical behavior. After setting references, run `sp config show <name> --resolved` to verify the resolved runtime surface.
 ---
 ## ACTION REQUIRED BEFORE ANYTHING ELSE

package/.xtrm/skills/default/using-kpi/SKILL.md CHANGED Viewed

@@ -143,6 +143,50 @@ sp db stats --with-payload --format json \
         end'
 ```
+## Recipe 7 — payload component breakdown per specialist
+**Truth source first.** The actual prompt size billed by the API is the first turn's `input_tokens` from `token_trajectory_json[0]`. Use it as the ground truth — `payload_breakdown` events undercount (tool definitions and harness framing are not captured) and historical rows before the rule N× fix overcount mandatory_rule by attached-rule count.
+```bash
+DB=.specialists/db/observability.db
+sqlite3 "$DB" "SELECT specialist, model, AVG(json_extract(token_trajectory_json, '\$[0].token_usage.input_tokens')) AS avg_first_in, COUNT(*) AS n FROM specialist_job_metrics WHERE token_trajectory_json IS NOT NULL AND status='done' GROUP BY specialist, model ORDER BY avg_first_in DESC"
+```
+Use this number for cost decisions. Use `payload_breakdown` only for *relative* component analysis (which knob to tune), not absolute sizing.
+`sp db stats --with-payload` only surfaces total `payload_kb` / `payload_tokens`. To audit *what* fills the prompt (system_prompt vs mandatory rules vs skills vs bead_context vs memory), query `payload_breakdown` events directly. Use this for eager-load bloat investigations, prompt/rule consolidation planning, or duplication hunts — but cross-check against the truth source above.
+```bash
+DB=.specialists/db/observability.db
+sqlite3 "$DB" "SELECT specialist, event_json FROM specialist_events WHERE type='payload_breakdown' GROUP BY specialist ORDER BY t DESC" \
+  | python3 -c '
+import json, sys
+rows = []
+for line in sys.stdin:
+    if "|" not in line: continue
+    spec, js = line.split("|", 1)
+    d = json.loads(js)
+    agg = {}
+    for c in d["payload_breakdown"]["components"]:
+        a = agg.setdefault(c["kind"], {"tokens":0,"n":0})
+        a["tokens"] += c["tokens"]; a["n"] += 1
+    rows.append((spec, d["payload_breakdown"]["totals"]["tokens"], agg))
+rows.sort(key=lambda r: -r[1])
+print(f"{\"specialist\":<22}{\"total\":>8}{\"rules\":>8}{\"rules_n\":>8}{\"sys\":>8}{\"skills\":>8}{\"bead\":>8}{\"mem\":>8}")
+for s, t, a in rows:
+    g = lambda k: a.get(k, {"tokens":0,"n":0})
+    print(f"{s:<22}{t:>8}{g(\"mandatory_rule\")[\"tokens\"]:>8}{g(\"mandatory_rule\")[\"n\"]:>8}{g(\"system_prompt\")[\"tokens\"]:>8}{g(\"skill\")[\"tokens\"]:>8}{g(\"bead_context\")[\"tokens\"]:>8}{g(\"memory\")[\"tokens\"]:>8}")
+'
+```
+Component kinds: `system_prompt`, `mandatory_rule` (one event entry per attached rule), `skill` (path reference, ~10 tokens — bodies are loaded on demand, not eagerly), `task_template`, `bead_context`, `memory`.
+Optimization signals:
+- `mandatory_rule` total dominates: audit wrapper inflation by comparing `bytes` per rule in the event vs `wc -c config/mandatory-rules/<id>.md`. Mismatch >5x means a wrapper or richer source is adding hidden cost — investigate `formatMandatoryRulesBlock` and `parseMandatoryRulesFrontmatter`.
+- `skill` total small (always): skills are reference-only at startup; inlining skill bodies into rules saves nothing.
+- `bead_context` huge: the bead description is bloated — orchestrator should write more concise contracts.
+- `memory` huge: stale or noisy memories — run `bd memories` cleanup or consolidation.
 ## References
 - `docs/observability-metrics.md`

package/.xtrm/skills/default/using-xtrm/SKILL.md CHANGED Viewed

@@ -25,6 +25,20 @@ bd update <id> --claim            # claim before any edit
 > Use `bv --robot-next` for the single top pick. Use `bv --robot-triage --format toon` to save context tokens. **Never run bare `bv` — it launches an interactive TUI.**
+---
+## Current xt Command Surfaces
+Use these command surfaces when the task is operational rather than code-editing:
+| Need | Command | Notes |
+|------|---------|-------|
+| Refresh xtrm-managed skills/hooks/reports in one repo | `xt update --apply` | Default `xt update` is dry-run; `--apply` writes. |
+| Refresh many repos | `xt update --apply --root <dir>` | Discovers repos with `.xtrm/registry.json`; failures are reported per repo. |
+| Cut a release | `xt release prepare --patch` then `xt release publish` | `prepare` drafts from xt reports; `publish` tags/pushes. If `prepare` fails on changelog script compatibility, check specialists `unitAI-dnmcg` state and use the manual fallback in `/releasing`. |
+| Close a session report | update latest same-day `.xtrm/reports/<date>-*.md` | `session-close-report` prefers one same-day SSOT handoff; do not create duplicate reports unless asked. |
 ---
 ## Trigger Patterns

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ---
+## [0.7.16] - 2026-05-05
+### Fixed
+- `xt update` and `xt install` now repair a broken `.xtrm/skills/default` symlink before running the registry install. Previously only `xt init` repaired stale dev-mode symlinks, so updates failed on machines where the legacy symlink target no longer existed. The npm package root is always the source.
+## [0.7.15] - 2026-05-05
+### Changed
+- Updated `using-xtrm` and `docs/XTRM-GUIDE.md` to document `xt update`, `xt release prepare/publish`, and same-day SSOT session report behavior.
 ## [0.7.14] - 2026-05-05
 ### Added

package/cli/dist/index.cjs CHANGED Viewed

@@ -55493,6 +55493,14 @@ async function runInstall(opts = {}) {
   console.log(kleur_default.bold("\n  \u2699  xtrm install (.xtrm registry scaffold)"));
   console.log(kleur_default.dim(`  \u2022 registry: ${registryPath}`));
   console.log(kleur_default.dim(`  \u2022 target: ${userXtrmDir}`));
+  const scaffoldResult = await scaffoldSkillsDefaultFromPackage({
+    packageRoot,
+    userXtrmDir,
+    dryRun
+  });
+  if (scaffoldResult === "copy") {
+    console.log(kleur_default.dim("  \u2022 Repaired .xtrm/skills/default from package payload"));
+  }
   const stats = await installFromRegistry({
     packageRoot,
     registry: registry2,