xtrm-tools 0.7.14 → 0.7.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.xtrm/registry.json +409 -417
- package/.xtrm/skills/default/specialists-creator/SKILL.md +16 -0
- package/.xtrm/skills/default/using-xtrm/SKILL.md +14 -0
- package/CHANGELOG.md +5 -0
- package/cli/package.json +1 -1
- package/package.json +1 -1
- package/packages/pi-extensions/package.json +1 -1
- package/.xtrm/skills/default/using-specialists-v3/SKILL.md +0 -284
- package/.xtrm/skills/default/using-specialists-v3/evals/evals.json +0 -89
|
@@ -13,6 +13,22 @@ synced_at: 236ca5e6
|
|
|
13
13
|
|
|
14
14
|
> Source of truth: `src/specialist/schema.ts` | Runtime: `src/specialist/runner.ts`
|
|
15
15
|
|
|
16
|
+
|
|
17
|
+
## Canonical References
|
|
18
|
+
|
|
19
|
+
When a custom specialist needs a standard rule or skill, reference the canonical asset by name instead of copying its file into the repo. Runtime/package fallback resolves canonical mandatory rules and skills when no project-local override exists.
|
|
20
|
+
|
|
21
|
+
Example:
|
|
22
|
+
|
|
23
|
+
```json
|
|
24
|
+
{
|
|
25
|
+
"mandatory_rules": { "template_sets": ["serena-cheatsheet"] },
|
|
26
|
+
"skills": { "paths": ["releasing"] }
|
|
27
|
+
}
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
Only create project-local copies when intentionally changing canonical behavior. After setting references, run `sp config show <name> --resolved` to verify the resolved runtime surface.
|
|
31
|
+
|
|
16
32
|
---
|
|
17
33
|
|
|
18
34
|
## ACTION REQUIRED BEFORE ANYTHING ELSE
|
|
@@ -25,6 +25,20 @@ bd update <id> --claim # claim before any edit
|
|
|
25
25
|
|
|
26
26
|
> Use `bv --robot-next` for the single top pick. Use `bv --robot-triage --format toon` to save context tokens. **Never run bare `bv` — it launches an interactive TUI.**
|
|
27
27
|
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Current xt Command Surfaces
|
|
31
|
+
|
|
32
|
+
Use these command surfaces when the task is operational rather than code-editing:
|
|
33
|
+
|
|
34
|
+
| Need | Command | Notes |
|
|
35
|
+
|------|---------|-------|
|
|
36
|
+
| Refresh xtrm-managed skills/hooks/reports in one repo | `xt update --apply` | Default `xt update` is dry-run; `--apply` writes. |
|
|
37
|
+
| Refresh many repos | `xt update --apply --root <dir>` | Discovers repos with `.xtrm/registry.json`; failures are reported per repo. |
|
|
38
|
+
| Cut a release | `xt release prepare --patch` then `xt release publish` | `prepare` drafts from xt reports; `publish` tags/pushes. If `prepare` fails on changelog script compatibility, check specialists `unitAI-dnmcg` state and use the manual fallback in `/releasing`. |
|
|
39
|
+
| Close a session report | update latest same-day `.xtrm/reports/<date>-*.md` | `session-close-report` prefers one same-day SSOT handoff; do not create duplicate reports unless asked. |
|
|
40
|
+
|
|
41
|
+
|
|
28
42
|
---
|
|
29
43
|
|
|
30
44
|
## Trigger Patterns
|
package/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
---
|
|
9
9
|
|
|
10
|
+
## [0.7.15] - 2026-05-05
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
- Updated `using-xtrm` and `docs/XTRM-GUIDE.md` to document `xt update`, `xt release prepare/publish`, and same-day SSOT session report behavior.
|
|
14
|
+
|
|
10
15
|
## [0.7.14] - 2026-05-05
|
|
11
16
|
|
|
12
17
|
### Added
|
package/cli/package.json
CHANGED
package/package.json
CHANGED
|
@@ -1,284 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: using-specialists-v3
|
|
3
|
-
description: >
|
|
4
|
-
Canonical specialist orchestration skill. Use proactively for substantial work
|
|
5
|
-
that should be delegated, tracked, reviewed, fixed, tested, or merged through
|
|
6
|
-
specialists: code review, debugging, implementation, planning, doc sync,
|
|
7
|
-
security checks, multi-step chains, and questions about specialist workflow.
|
|
8
|
-
version: 3.1
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
# Using Specialists v3
|
|
12
|
-
|
|
13
|
-
You are the orchestrator. Your job is to turn user intent into a clear bead contract, choose the right specialist from the live registry, launch the chain, monitor it, consume results, drive fixes, and publish through the specialist merge path.
|
|
14
|
-
|
|
15
|
-
Keep this skill practical. It should contain the core behavior needed to orchestrate well; use live commands for volatile details instead of embedding a static catalog.
|
|
16
|
-
|
|
17
|
-
## When To Delegate
|
|
18
|
-
|
|
19
|
-
Use specialists for substantial work: codebase exploration, debugging, implementation, review, test execution, planning, documentation sync, security/config audit, release publication, and multi-chain epics.
|
|
20
|
-
|
|
21
|
-
Do small deterministic edits directly when the scope is already obvious and delegation would add ceremony. Do not self-investigate or self-implement a substantial task just because you can read files faster; the audit trail and specialist review are part of the workflow.
|
|
22
|
-
|
|
23
|
-
## Non-Negotiable Rules
|
|
24
|
-
|
|
25
|
-
1. `--bead` is the prompt for tracked work.
|
|
26
|
-
2. Do not dispatch until the bead is a usable task contract.
|
|
27
|
-
3. Never use `--prompt` to supplement tracked work. Update the bead instead.
|
|
28
|
-
4. Choose by task shape, not by habit. Check `specialists list --full` when roles may have changed.
|
|
29
|
-
5. Explorer/debugger answer uncertainty before executor writes code.
|
|
30
|
-
6. Executor starts only when scope, constraints, and validation are clear.
|
|
31
|
-
7. Reviewer uses its own bead and the executor workspace via `--job <exec-job>`.
|
|
32
|
-
8. Keep executor/debugger jobs alive through review so they can be resumed.
|
|
33
|
-
9. Merge specialist-owned work with `sp merge` or `sp epic merge`, not manual `git merge`.
|
|
34
|
-
10. Specialists must not perform destructive or irreversible operations.
|
|
35
|
-
11. Treat tests as evidence: classify failures as in-scope, pre-existing, or infrastructure before starting a fix loop.
|
|
36
|
-
12. Drive routine stages autonomously once the task is clear. Escalate only for human judgment, destructive actions, repeated crashes, or reviewer `FAIL`.
|
|
37
|
-
|
|
38
|
-
## Live Registry And Help
|
|
39
|
-
|
|
40
|
-
Use the live registry for role details, permissions, current models, and skills:
|
|
41
|
-
|
|
42
|
-
```bash
|
|
43
|
-
specialists list --full
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
Use help for command flags and subcommands:
|
|
47
|
-
|
|
48
|
-
```bash
|
|
49
|
-
sp help
|
|
50
|
-
sp run --help
|
|
51
|
-
sp ps --help
|
|
52
|
-
sp feed --help
|
|
53
|
-
sp result --help
|
|
54
|
-
sp resume --help
|
|
55
|
-
sp merge --help
|
|
56
|
-
sp epic --help
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
Do not rely on stale remembered flags when help is available.
|
|
60
|
-
|
|
61
|
-
## Role Selection
|
|
62
|
-
|
|
63
|
-
Common routing:
|
|
64
|
-
|
|
65
|
-
| Need | Specialist |
|
|
66
|
-
| --- | --- |
|
|
67
|
-
| Unknown architecture, call flow, dependencies, implementation options | `explorer` |
|
|
68
|
-
| Symptom, stack trace, regression, flaky/failing test, root cause | `debugger` |
|
|
69
|
-
| Broad feature decomposition, bead board, dependencies, sequencing | `planner` |
|
|
70
|
-
| Risky design choice, tradeoff, premortem, critique | `overthinker` |
|
|
71
|
-
| Clear implementation or scoped doc edit | `executor` |
|
|
72
|
-
| Cheap implementation-quality smell pass before final review | `code-sanity` |
|
|
73
|
-
| Security/config/dependency audit with recommendations only | `security-auditor` |
|
|
74
|
-
| Final compliance verdict on executor/debugger diff | `reviewer` |
|
|
75
|
-
| Run checks and interpret failures without fixing | `test-runner` |
|
|
76
|
-
| Exactly one doc needs drift-aware sync | `sync-docs` |
|
|
77
|
-
| Current external docs/API/ecosystem research | `researcher` |
|
|
78
|
-
| Create or fix specialist config/schema | `specialists-creator` |
|
|
79
|
-
| Release changelog/package/dist/tag publication | `changelog-keeper` through the `releasing` skill |
|
|
80
|
-
|
|
81
|
-
Selection rules:
|
|
82
|
-
|
|
83
|
-
- Use `explorer` when you need evidence before deciding what to change.
|
|
84
|
-
- Use `debugger` instead of explorer when there is a failure symptom.
|
|
85
|
-
- Use `executor` only after the task can name target files/symbols or a bounded discovery result.
|
|
86
|
-
- Use `reviewer` as the merge gate; code-sanity and security-auditor are advisory.
|
|
87
|
-
- Use `test-runner` for running/classifying tests; it does not implement fixes.
|
|
88
|
-
- Use `specialists-creator` before changing specialist definitions.
|
|
89
|
-
|
|
90
|
-
## Bead Contract
|
|
91
|
-
|
|
92
|
-
Every specialist-bound bead must be a usable prompt. Title-only beads are not acceptable.
|
|
93
|
-
|
|
94
|
-
Required structure:
|
|
95
|
-
|
|
96
|
-
```text
|
|
97
|
-
PROBLEM: What is wrong or needed.
|
|
98
|
-
SUCCESS: Observable completion criteria.
|
|
99
|
-
SCOPE: Files, symbols, commands, docs, or discovery area.
|
|
100
|
-
NON_GOALS: Explicitly out of scope.
|
|
101
|
-
CONSTRAINTS: Safety, compatibility, style, permissions, sequencing.
|
|
102
|
-
VALIDATION: Checks/tests/review expected before closure.
|
|
103
|
-
OUTPUT: Expected handoff format.
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
If the existing issue is vague, update it before dispatch:
|
|
107
|
-
|
|
108
|
-
```bash
|
|
109
|
-
bd update <id> --notes "CONTRACT: ..."
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
Contract tuning by role:
|
|
113
|
-
|
|
114
|
-
- Explorer: ask specific questions; require citations to files/symbols/flows; forbid implementation.
|
|
115
|
-
- Debugger: include symptom, reproduction, expected/actual behavior, logs/tests; ask for root cause and minimal fix path.
|
|
116
|
-
- Executor: name target files/symbols and do-not-touch boundaries; require verification evidence.
|
|
117
|
-
- Reviewer: reference the executor job, diff, acceptance criteria, constraints, and required verdict format.
|
|
118
|
-
- Test-runner: name exact commands/suites and expected classification of failures.
|
|
119
|
-
- Sync-docs: exactly one doc in scope.
|
|
120
|
-
|
|
121
|
-
## Canonical Single-Chain Flow
|
|
122
|
-
|
|
123
|
-
Use this for one implementation branch.
|
|
124
|
-
|
|
125
|
-
```bash
|
|
126
|
-
# 1. Create or claim root task bead with complete contract
|
|
127
|
-
bd create --title "..." --type task --priority 2 --description "PROBLEM: ..."
|
|
128
|
-
bd update <task> --claim
|
|
129
|
-
|
|
130
|
-
# 2. Optional discovery when path is unknown
|
|
131
|
-
bd create --title "Explore ..." --type task --priority 2 --description "PROBLEM: ... OUTPUT: evidence-backed plan."
|
|
132
|
-
bd dep add <explore> <task>
|
|
133
|
-
specialists run explorer --bead <explore> --context-depth 3
|
|
134
|
-
specialists result <explore-job>
|
|
135
|
-
|
|
136
|
-
# 3. Implementation
|
|
137
|
-
bd create --title "Implement ..." --type task --priority 2 --description "PROBLEM: ... VALIDATION: ..."
|
|
138
|
-
bd dep add <impl> <explore-or-task>
|
|
139
|
-
specialists run executor --bead <impl> --context-depth 3
|
|
140
|
-
specialists result <exec-job>
|
|
141
|
-
|
|
142
|
-
# 4. Optional advisory passes
|
|
143
|
-
specialists run code-sanity --bead <sanity-bead> --job <exec-job> --context-depth 3
|
|
144
|
-
specialists run security-auditor --bead <security-bead> --job <exec-job> --context-depth 3
|
|
145
|
-
|
|
146
|
-
# 5. Final review
|
|
147
|
-
bd create --title "Review ..." --type task --priority 2 --description "PROBLEM: Verify executor output ... OUTPUT: PASS/PARTIAL/FAIL."
|
|
148
|
-
bd dep add <review> <impl>
|
|
149
|
-
specialists run reviewer --bead <review> --job <exec-job> --context-depth 3
|
|
150
|
-
specialists result <review-job>
|
|
151
|
-
|
|
152
|
-
# 6. Publish after reviewer PASS
|
|
153
|
-
sp merge <impl>
|
|
154
|
-
bd close <task> --reason "Reviewer PASS; merged."
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
Edit-capable specialists with `--bead` auto-provision a worktree. `--worktree` is accepted for clarity but is usually unnecessary. Use `--job <exec-job>` for reviewer/fix passes that must enter the existing executor workspace.
|
|
158
|
-
|
|
159
|
-
## Review And Fix Loop
|
|
160
|
-
|
|
161
|
-
A chain stays alive until it is merged or abandoned.
|
|
162
|
-
|
|
163
|
-
```text
|
|
164
|
-
executor/debugger -> waiting
|
|
165
|
-
optional code-sanity/security-auditor -> advisory findings
|
|
166
|
-
reviewer -> PASS | PARTIAL | FAIL
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
- `PASS`: verify expected commit/diff, then publish.
|
|
170
|
-
- `PARTIAL`: resume the same executor/debugger with exact findings, then re-review.
|
|
171
|
-
- `FAIL`: stop and decide whether to replace the chain, re-scope the bead, or ask the operator if judgment is required.
|
|
172
|
-
|
|
173
|
-
Prefer resume over spawning a new fix executor when the original job is waiting and context is healthy:
|
|
174
|
-
|
|
175
|
-
```bash
|
|
176
|
-
sp resume <exec-job> "Reviewer PARTIAL. Fix only these findings: ..."
|
|
177
|
-
```
|
|
178
|
-
|
|
179
|
-
Do not treat job completion, code-sanity OK, or security no-findings as equivalent to reviewer PASS.
|
|
180
|
-
|
|
181
|
-
## Monitoring And Steering
|
|
182
|
-
|
|
183
|
-
Use `sp ps` for state and `sp result` for completed turns.
|
|
184
|
-
|
|
185
|
-
```bash
|
|
186
|
-
sp ps
|
|
187
|
-
sp ps <job-id>
|
|
188
|
-
sp ps --bead <bead-id>
|
|
189
|
-
sp feed <job-id> # live/running output
|
|
190
|
-
sp result <job-id> # done/error/waiting result
|
|
191
|
-
```
|
|
192
|
-
|
|
193
|
-
If a job is running, use `sp feed`. If it is waiting, use `sp result` and decide whether to resume, review, merge, or stop. Avoid tight polling; sleep based on task size, then check once.
|
|
194
|
-
|
|
195
|
-
Use `steer` for running jobs and `resume` for waiting jobs:
|
|
196
|
-
|
|
197
|
-
```bash
|
|
198
|
-
sp steer <job-id> "Stop broad audit. Answer only the three bead questions."
|
|
199
|
-
sp resume <job-id> "Continue with the next scoped fix. Do not refactor."
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
Context usage is an action signal when available:
|
|
203
|
-
|
|
204
|
-
- 0-40%: healthy.
|
|
205
|
-
- 40-65%: monitor.
|
|
206
|
-
- 65-80%: steer toward conclusion.
|
|
207
|
-
- Above 80%: finish, summarize, or replace the job.
|
|
208
|
-
|
|
209
|
-
Raw token totals are not context percentages.
|
|
210
|
-
|
|
211
|
-
## Merge And Publication
|
|
212
|
-
|
|
213
|
-
Standalone chain:
|
|
214
|
-
|
|
215
|
-
```bash
|
|
216
|
-
sp merge <chain-root-bead>
|
|
217
|
-
```
|
|
218
|
-
|
|
219
|
-
Epic-owned chains:
|
|
220
|
-
|
|
221
|
-
```bash
|
|
222
|
-
sp epic status <epic-id>
|
|
223
|
-
sp epic merge <epic-id>
|
|
224
|
-
```
|
|
225
|
-
|
|
226
|
-
Rules:
|
|
227
|
-
|
|
228
|
-
- Merge only after reviewer PASS unless the operator explicitly accepts a draft for follow-up work.
|
|
229
|
-
- Use `sp epic merge` for unresolved epic chains; `sp merge` refuses those by design.
|
|
230
|
-
- Do not manually `git merge` specialist branches.
|
|
231
|
-
- If merge refuses because a chain job is still `waiting`, consume the result and either resume/stop/finalize that job deliberately.
|
|
232
|
-
- If merge reports a dirty worktree, inspect that worktree. Revert generated noise only when it is clearly unrelated; otherwise ask or re-dispatch.
|
|
233
|
-
- Run or confirm required gates before closing the root bead or epic.
|
|
234
|
-
|
|
235
|
-
## Multi-Chain Epic Flow
|
|
236
|
-
|
|
237
|
-
Use an epic when multiple implementation chains publish together.
|
|
238
|
-
|
|
239
|
-
1. Create an epic bead with complete contract.
|
|
240
|
-
2. Use planner/explorer for shared prep if needed.
|
|
241
|
-
3. Create independent implementation beads with disjoint file scopes.
|
|
242
|
-
4. Dispatch executors in parallel only when scopes are provably disjoint.
|
|
243
|
-
5. Review each chain with its own review bead and `--job`.
|
|
244
|
-
6. After every chain has reviewer PASS, publish with `sp epic merge <epic-id>`.
|
|
245
|
-
|
|
246
|
-
Use `--epic <id>` when a job belongs to an epic but its bead is not a direct child. Avoid parallel executors on the same file; sequence them or consolidate the work.
|
|
247
|
-
|
|
248
|
-
## Failure Recovery
|
|
249
|
-
|
|
250
|
-
When something fails:
|
|
251
|
-
|
|
252
|
-
```bash
|
|
253
|
-
sp ps <job-id>
|
|
254
|
-
sp feed <job-id>
|
|
255
|
-
sp result <job-id>
|
|
256
|
-
sp doctor
|
|
257
|
-
```
|
|
258
|
-
|
|
259
|
-
Then choose one action:
|
|
260
|
-
|
|
261
|
-
- Steer a running job back to scope.
|
|
262
|
-
- Resume a waiting job with exact next instructions.
|
|
263
|
-
- Stop a dead or obsolete job.
|
|
264
|
-
- Rerun with a better bead contract.
|
|
265
|
-
- Switch specialist if the selected role was wrong.
|
|
266
|
-
- Report blocker if destructive/high-risk/manual action is required.
|
|
267
|
-
|
|
268
|
-
Common recovery commands:
|
|
269
|
-
|
|
270
|
-
```bash
|
|
271
|
-
sp stop <job-id>
|
|
272
|
-
sp clean --processes --dry-run
|
|
273
|
-
sp epic status <epic-id>
|
|
274
|
-
sp epic sync <epic-id> --apply
|
|
275
|
-
sp epic abandon <epic-id> --reason "..."
|
|
276
|
-
specialists doctor --check-drift
|
|
277
|
-
sp prune-stale-defaults --dry-run
|
|
278
|
-
```
|
|
279
|
-
|
|
280
|
-
Do not silently take over substantial specialist work yourself unless the operator agrees or the remaining change is genuinely small and deterministic.
|
|
281
|
-
|
|
282
|
-
## What Stays Out Of This Skill
|
|
283
|
-
|
|
284
|
-
Do not embed the full specialist catalog, all CLI help, release mechanics, stale incident reports, or historical gotchas. Keep volatile detail in `specialists list --full`, `sp help`, bead notes, and focused skills such as `releasing`, `using-nodes`, or `specialists-creator`.
|
|
@@ -1,89 +0,0 @@
|
|
|
1
|
-
{
|
|
2
|
-
"skill_name": "using-specialists-v3",
|
|
3
|
-
"evals": [
|
|
4
|
-
{
|
|
5
|
-
"id": 1,
|
|
6
|
-
"eval_name": "role-selection-implementation",
|
|
7
|
-
"prompt": "Need add one small feature in src/specialist/loader.ts, but I do not know exact path yet. Which specialist should handle discovery and implementation?",
|
|
8
|
-
"expected_output": "Agent checks live registry if needed, creates/updates complete bead contracts, selects explorer for discovery then executor for implementation, and does not self-investigate substantial work.",
|
|
9
|
-
"assertions": [
|
|
10
|
-
{
|
|
11
|
-
"name": "selects_specialist_role",
|
|
12
|
-
"description": "Agent names a specialist role appropriate for unknown implementation work"
|
|
13
|
-
},
|
|
14
|
-
{
|
|
15
|
-
"name": "uses_live_registry",
|
|
16
|
-
"description": "Agent references specialists list --full instead of a static catalog"
|
|
17
|
-
},
|
|
18
|
-
{
|
|
19
|
-
"name": "does_not_self_investigate",
|
|
20
|
-
"description": "Agent does not read source files and solve it directly"
|
|
21
|
-
}
|
|
22
|
-
],
|
|
23
|
-
"files": []
|
|
24
|
-
},
|
|
25
|
-
{
|
|
26
|
-
"id": 2,
|
|
27
|
-
"eval_name": "role-selection-debugging",
|
|
28
|
-
"prompt": "A specialist chain started failing with a stack trace and inconsistent result state. Who should inspect it, and what command surface should I use to check available flags?",
|
|
29
|
-
"expected_output": "Agent selects debugger for root-cause analysis, may use test-runner for check execution, and points to sp help/subcommand help before relying on flags.",
|
|
30
|
-
"assertions": [
|
|
31
|
-
{
|
|
32
|
-
"name": "selects_debugging_role",
|
|
33
|
-
"description": "Agent chooses debugger or test-runner for failure analysis"
|
|
34
|
-
},
|
|
35
|
-
{
|
|
36
|
-
"name": "uses_help_surface",
|
|
37
|
-
"description": "Agent references sp help or subcommand help for command details"
|
|
38
|
-
},
|
|
39
|
-
{
|
|
40
|
-
"name": "does_not_guess_flags",
|
|
41
|
-
"description": "Agent does not invent CLI flags from memory"
|
|
42
|
-
}
|
|
43
|
-
],
|
|
44
|
-
"files": []
|
|
45
|
-
},
|
|
46
|
-
{
|
|
47
|
-
"id": 3,
|
|
48
|
-
"eval_name": "role-selection-review",
|
|
49
|
-
"prompt": "Executor finished a change and I need final verification before merge. Which specialist next, and what should it check?",
|
|
50
|
-
"expected_output": "Agent selects reviewer with its own bead and --job <exec-job>, checks bead contract plus diff, and treats PASS as merge gate.",
|
|
51
|
-
"assertions": [
|
|
52
|
-
{
|
|
53
|
-
"name": "selects_reviewer_role",
|
|
54
|
-
"description": "Agent chooses reviewer for post-implementation verification"
|
|
55
|
-
},
|
|
56
|
-
{
|
|
57
|
-
"name": "checks_contract_and_diff",
|
|
58
|
-
"description": "Agent states reviewer checks bead contract and diff"
|
|
59
|
-
},
|
|
60
|
-
{
|
|
61
|
-
"name": "does_not_replace_reviewer_with_self",
|
|
62
|
-
"description": "Agent does not perform the review directly"
|
|
63
|
-
}
|
|
64
|
-
],
|
|
65
|
-
"files": []
|
|
66
|
-
},
|
|
67
|
-
{
|
|
68
|
-
"id": 4,
|
|
69
|
-
"eval_name": "merge-publication-flow",
|
|
70
|
-
"prompt": "Reviewer passed an executor chain. What should the orchestrator do next to publish the specialist work?",
|
|
71
|
-
"expected_output": "Agent uses sp merge <chain-root-bead> for standalone chains or sp epic merge <epic-id> for epic-owned work, avoids manual git merge, and closes the bead only after required gates are confirmed.",
|
|
72
|
-
"assertions": [
|
|
73
|
-
{
|
|
74
|
-
"name": "uses_specialist_merge",
|
|
75
|
-
"description": "Agent names sp merge or sp epic merge as the publication path"
|
|
76
|
-
},
|
|
77
|
-
{
|
|
78
|
-
"name": "avoids_manual_git_merge",
|
|
79
|
-
"description": "Agent explicitly avoids manual git merge for specialist-owned work"
|
|
80
|
-
},
|
|
81
|
-
{
|
|
82
|
-
"name": "honors_reviewer_gate",
|
|
83
|
-
"description": "Agent publishes only after reviewer PASS or explicit operator acceptance"
|
|
84
|
-
}
|
|
85
|
-
],
|
|
86
|
-
"files": []
|
|
87
|
-
}
|
|
88
|
-
]
|
|
89
|
-
}
|