@jaggerxtrm/specialists 3.12.0 → 3.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/config/hooks/specialists-session-start.mjs +1 -1
  2. package/config/mandatory-rules/bead-id-verbatim.md +14 -0
  3. package/config/mandatory-rules/per-turn-handoff-schema.md +16 -0
  4. package/config/skills/specialists-creator/SKILL.md +16 -0
  5. package/config/skills/update-specialists/SKILL.md +183 -350
  6. package/config/skills/using-kpi/SKILL.md +86 -0
  7. package/config/skills/using-script-specialists/SKILL.md +7 -5
  8. package/config/skills/using-specialists-v2/SKILL.md +1 -1
  9. package/config/skills/using-specialists-v3/SKILL.md +530 -112
  10. package/config/specialists/changelog-keeper.specialist.json +2 -1
  11. package/config/specialists/code-sanity.specialist.json +3 -1
  12. package/config/specialists/debugger.specialist.json +3 -1
  13. package/config/specialists/executor.specialist.json +3 -1
  14. package/config/specialists/explorer.specialist.json +2 -1
  15. package/config/specialists/overthinker.specialist.json +2 -1
  16. package/config/specialists/planner.specialist.json +3 -1
  17. package/config/specialists/researcher.specialist.json +2 -1
  18. package/config/specialists/reviewer.specialist.json +3 -1
  19. package/config/specialists/security-auditor.specialist.json +53 -10
  20. package/config/specialists/specialists-creator.specialist.json +2 -2
  21. package/config/specialists/sync-docs.specialist.json +3 -1
  22. package/config/specialists/test-runner.specialist.json +2 -1
  23. package/dist/index.js +634 -498
  24. package/dist/lib.js +183 -62
  25. package/dist/types/cli/help.d.ts.map +1 -1
  26. package/dist/types/cli/run.d.ts.map +1 -1
  27. package/dist/types/cli/script.d.ts.map +1 -1
  28. package/dist/types/cli/serve.d.ts +11 -2
  29. package/dist/types/cli/serve.d.ts.map +1 -1
  30. package/dist/types/cli/version-check.d.ts +3 -0
  31. package/dist/types/cli/version-check.d.ts.map +1 -1
  32. package/dist/types/index.d.ts +1 -1
  33. package/dist/types/specialist/mandatory-rules.d.ts +5 -0
  34. package/dist/types/specialist/mandatory-rules.d.ts.map +1 -1
  35. package/dist/types/specialist/observability-sqlite.d.ts +1 -0
  36. package/dist/types/specialist/observability-sqlite.d.ts.map +1 -1
  37. package/dist/types/specialist/schema.d.ts +27 -0
  38. package/dist/types/specialist/schema.d.ts.map +1 -1
  39. package/dist/types/specialist/script-runner.d.ts +5 -1
  40. package/dist/types/specialist/script-runner.d.ts.map +1 -1
  41. package/package.json +4 -4
  42. package/config/specialists/.serena/project.yml +0 -151
@@ -143,6 +143,92 @@ sp db stats --with-payload --format json \
143
143
  end'
144
144
  ```
145
145
 
146
+ ## Recipe 7 — payload component breakdown per specialist
147
+
148
+ **Truth source first.** The actual prompt size billed by the API is the first turn's `input_tokens` from `token_trajectory_json[0]`. Use it as the ground truth — `payload_breakdown` events undercount (tool definitions and harness framing are not captured) and historical rows before the rule N× fix overcount mandatory_rule by attached-rule count.
149
+
150
+ ```bash
151
+ DB=.specialists/db/observability.db
152
+ sqlite3 "$DB" "SELECT specialist, model, AVG(json_extract(token_trajectory_json, '\$[0].token_usage.input_tokens')) AS avg_first_in, COUNT(*) AS n FROM specialist_job_metrics WHERE token_trajectory_json IS NOT NULL AND status='done' GROUP BY specialist, model ORDER BY avg_first_in DESC"
153
+ ```
154
+
155
+ Use this number for cost decisions. Use `payload_breakdown` only for *relative* component analysis (which knob to tune), not absolute sizing.
156
+
157
+ `sp db stats --with-payload` only surfaces total `payload_kb` / `payload_tokens`. To audit *what* fills the prompt (system_prompt vs mandatory rules vs skills vs bead_context vs memory), query `payload_breakdown` events directly. Use this for eager-load bloat investigations, prompt/rule consolidation planning, or duplication hunts — but cross-check against the truth source above.
158
+
159
+ ```bash
160
+ DB=.specialists/db/observability.db
161
+ sqlite3 "$DB" "SELECT specialist, event_json FROM specialist_events WHERE type='payload_breakdown' GROUP BY specialist ORDER BY t DESC" \
162
+ | python3 -c '
163
+ import json, sys
164
+ rows = []
165
+ for line in sys.stdin:
166
+ if "|" not in line: continue
167
+ spec, js = line.split("|", 1)
168
+ d = json.loads(js)
169
+ agg = {}
170
+ for c in d["payload_breakdown"]["components"]:
171
+ a = agg.setdefault(c["kind"], {"tokens":0,"n":0})
172
+ a["tokens"] += c["tokens"]; a["n"] += 1
173
+ rows.append((spec, d["payload_breakdown"]["totals"]["tokens"], agg))
174
+ rows.sort(key=lambda r: -r[1])
175
+ print(f"{\"specialist\":<22}{\"total\":>8}{\"rules\":>8}{\"rules_n\":>8}{\"sys\":>8}{\"skills\":>8}{\"bead\":>8}{\"mem\":>8}")
176
+ for s, t, a in rows:
177
+ g = lambda k: a.get(k, {"tokens":0,"n":0})
178
+ print(f"{s:<22}{t:>8}{g(\"mandatory_rule\")[\"tokens\"]:>8}{g(\"mandatory_rule\")[\"n\"]:>8}{g(\"system_prompt\")[\"tokens\"]:>8}{g(\"skill\")[\"tokens\"]:>8}{g(\"bead_context\")[\"tokens\"]:>8}{g(\"memory\")[\"tokens\"]:>8}")
179
+ '
180
+ ```
181
+
182
+ Component kinds: `system_prompt`, `mandatory_rule` (one event entry per attached rule), `skill` (path/description label only — full bodies are eagerly injected at runtime but NOT counted here), `task_template`, `bead_context`, `memory`.
183
+
184
+ **Important:** `skill` entries in `payload_breakdown` show only the path/description label (~10-40 tokens). The full skill body is forcefully injected via `skills.paths` on every run and IS billed as input tokens. To measure the real eager-skill cost, see Recipe 8.
185
+
186
+ Optimization signals (from breakdown alone):
187
+ - `mandatory_rule` total dominates: audit wrapper inflation by comparing `bytes` per rule in the event vs `wc -c config/mandatory-rules/<id>.md`. Mismatch >5x means a wrapper or richer source is adding hidden cost.
188
+ - `bead_context` huge: bead description is bloated — orchestrator should write more concise contracts.
189
+ - `memory` huge: stale or noisy memories — run `bd memories` cleanup or consolidation.
190
+
191
+ ## Recipe 8 — eager skill-body cost per specialist
192
+
193
+ `skills.paths` are eagerly injected on every run; the bodies appear in the API-billed prompt but the `payload_breakdown` event records only the path label. To derive the real eager-skill cost:
194
+
195
+ ```
196
+ eager_skill_cost ≈ first_turn_input_tokens − sum(payload_breakdown non-skill components)
197
+ − constant per-specialist framing/tool-defs overhead
198
+ ```
199
+
200
+ Two-step audit:
201
+
202
+ ```bash
203
+ # Step 1: real first-turn input tokens per specialist (truth)
204
+ DB=.specialists/db/observability.db
205
+ sqlite3 "$DB" "
206
+ SELECT specialist, AVG(json_extract(token_trajectory_json, '\$[0].token_usage.input_tokens')) AS avg_first_in, COUNT(*) AS n
207
+ FROM specialist_job_metrics
208
+ WHERE token_trajectory_json IS NOT NULL AND status='done'
209
+ GROUP BY specialist ORDER BY avg_first_in DESC"
210
+
211
+ # Step 2: per-specialist measured non-skill components (post-kdl4n)
212
+ sqlite3 "$DB" "SELECT specialist, event_json FROM specialist_events WHERE type='payload_breakdown' GROUP BY specialist ORDER BY t DESC" \
213
+ | python3 -c '
214
+ import json, sys
215
+ for line in sys.stdin:
216
+ if "|" not in line: continue
217
+ spec, js = line.split("|", 1)
218
+ d = json.loads(js)
219
+ non_skill = sum(c["tokens"] for c in d["payload_breakdown"]["components"] if c["kind"] != "skill")
220
+ print(f"{spec:<22}{non_skill:>10}")
221
+ '
222
+ ```
223
+
224
+ Then `delta = first_in − non_skill_total`. The framing/tool-defs constant is roughly the same across specialists with the same model — you can estimate it by running a specialist with NO `skills.paths` attached as a baseline.
225
+
226
+ Per-skill body weight: `wc -c <skill-path>/SKILL.md` divided by 4 (cl100k_base approximation). High-frequency, large-body skills are the inlining candidates; low-frequency or small ones stay attached.
227
+
228
+ Optimization signals (skills):
229
+ - `delta` >> sum of attached skill body bytes/4: framing/tool defs are the bulk — leave skills alone.
230
+ - `delta` ≈ sum of skill body weights: skills dominate eager cost — inline frequently-used hot guidance into `system_prompt`, keep rare deep references as skills, consider splitting big mixed skills.
231
+
146
232
  ## References
147
233
 
148
234
  - `docs/observability-metrics.md`
@@ -9,7 +9,9 @@ description: >
9
9
  output immediately, or when the work is a single LLM call with structured
10
10
  input/output. Do NOT use for tracked agent work — that belongs to
11
11
  `using-specialists-v2`.
12
- version: 1.0
12
+ version: 1.1.0
13
+ updated: 2026-05-06
14
+ synced_at: a0e54d0c
13
15
  ---
14
16
 
15
17
  # Script-Class Specialists
@@ -54,7 +56,7 @@ A spec is rejected at request time (`specialist_load_error`) if any of:
54
56
  - `execution.interactive` is `true`
55
57
  - `execution.requires_worktree` is `true`
56
58
  - `execution.permission_required` is anything other than `READ_ONLY`
57
- - `skills.scripts` is non-empty
59
+ - `skills.scripts` is non-empty (always rejected; no `--allow-local-scripts` bypass)
58
60
  - `prompt.task_template` is missing
59
61
  - a referenced `$var` in the chosen template is not supplied (`template_variable_missing`)
60
62
 
@@ -101,9 +103,9 @@ sp script <specialist-name> \
101
103
  Behaviour:
102
104
 
103
105
  - Loads the spec via `SpecialistLoader` (same loader as `sp run`).
104
- - Renders `prompt.task_template` (or named template) with `--vars`.
105
- - Spawns `pi --mode json --no-session --no-extensions --no-tools` with the
106
- resolved model.
106
+ - Renders `prompt.task_template` (or named template) with `--vars`, then feeds the rendered prompt via stdin.
107
+ - `--db-path /path/to/observability.db` is an exact SQLite file path; omit it to use the project default `.specialists/db/observability.db`.
108
+ - Spawns `pi` in JSON mode with no session, no extensions, no tools, and offline; forwards the resolved model, optional `--thinking`, and `--system-prompt` when `prompt.system` is set (full override, not append).
107
109
  - Returns the final assistant text on stdout. With `--json`, returns the full
108
110
  `ScriptGenerateResult` envelope.
109
111
  - Writes one row to `.specialists/db/observability.db` (same writer as `sp run`).
@@ -272,7 +272,7 @@ sp epic abandon <epic-id> --reason "..."
272
272
  sp end
273
273
  ```
274
274
 
275
- `sp result <job-id>` returns the most recent completed turn for `waiting` jobs with a `Session is waiting for your input` footer — use it to inspect a keep-alive job before deciding whether to resume. For `running` jobs, `sp feed <job-id>` is the right tool; `sp poll` is deprecated. Avoid `specialists status --job` for normal monitoring; prefer `sp ps <job-id>`.
275
+ `sp result <job-id>` returns the most recent completed turn for `waiting` jobs with a `Session is waiting for your input` footer — use it to inspect a keep-alive job before deciding whether to resume. For `running` jobs, `sp feed <job-id>` is the right tool. Avoid `specialists status --job` for normal monitoring; prefer `sp ps <job-id>`.
276
276
 
277
277
  ## Flag Semantics
278
278