opencode-multiagent 0.2.0 → 0.3.0-next.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +62 -0
- package/CHANGELOG.md +18 -0
- package/CONTRIBUTING.md +36 -0
- package/README.md +41 -165
- package/README.tr.md +84 -0
- package/RELEASE.md +68 -0
- package/agents/advisor.md +9 -6
- package/agents/auditor.md +8 -6
- package/agents/critic.md +19 -10
- package/agents/deep-worker.md +11 -7
- package/agents/devil.md +3 -1
- package/agents/executor.md +20 -19
- package/agents/heavy-worker.md +11 -7
- package/agents/lead.md +22 -30
- package/agents/librarian.md +6 -2
- package/agents/planner.md +18 -10
- package/agents/qa.md +9 -6
- package/agents/quick.md +12 -7
- package/agents/reviewer.md +9 -6
- package/agents/scout.md +9 -5
- package/agents/scribe.md +33 -28
- package/agents/strategist.md +10 -7
- package/agents/ui-heavy-worker.md +11 -7
- package/agents/ui-worker.md +12 -7
- package/agents/validator.md +8 -5
- package/agents/worker.md +12 -7
- package/commands/execute.md +1 -0
- package/commands/init-deep.md +1 -0
- package/commands/init.md +1 -0
- package/commands/inspect.md +1 -0
- package/commands/plan.md +1 -0
- package/commands/quality.md +1 -0
- package/commands/review.md +1 -0
- package/commands/status.md +1 -0
- package/defaults/opencode-multiagent.json +223 -0
- package/defaults/opencode-multiagent.schema.json +249 -0
- package/dist/control-plane.d.ts +4 -0
- package/dist/control-plane.d.ts.map +1 -0
- package/dist/index.d.ts +5 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +1583 -0
- package/dist/opencode-multiagent/compiler.d.ts +19 -0
- package/dist/opencode-multiagent/compiler.d.ts.map +1 -0
- package/dist/opencode-multiagent/constants.d.ts +116 -0
- package/dist/opencode-multiagent/constants.d.ts.map +1 -0
- package/dist/opencode-multiagent/defaults.d.ts +10 -0
- package/dist/opencode-multiagent/defaults.d.ts.map +1 -0
- package/dist/opencode-multiagent/file-lock.d.ts +15 -0
- package/dist/opencode-multiagent/file-lock.d.ts.map +1 -0
- package/dist/opencode-multiagent/hooks.d.ts +62 -0
- package/dist/opencode-multiagent/hooks.d.ts.map +1 -0
- package/dist/opencode-multiagent/log.d.ts +2 -0
- package/dist/opencode-multiagent/log.d.ts.map +1 -0
- package/dist/opencode-multiagent/markdown.d.ts +8 -0
- package/dist/opencode-multiagent/markdown.d.ts.map +1 -0
- package/dist/opencode-multiagent/mcp.d.ts +3 -0
- package/dist/opencode-multiagent/mcp.d.ts.map +1 -0
- package/dist/opencode-multiagent/policy.d.ts +5 -0
- package/dist/opencode-multiagent/policy.d.ts.map +1 -0
- package/dist/opencode-multiagent/quality.d.ts +14 -0
- package/dist/opencode-multiagent/quality.d.ts.map +1 -0
- package/dist/opencode-multiagent/runtime.d.ts +7 -0
- package/dist/opencode-multiagent/runtime.d.ts.map +1 -0
- package/dist/opencode-multiagent/session-tracker.d.ts +32 -0
- package/dist/opencode-multiagent/session-tracker.d.ts.map +1 -0
- package/dist/opencode-multiagent/skills.d.ts +17 -0
- package/dist/opencode-multiagent/skills.d.ts.map +1 -0
- package/dist/opencode-multiagent/supervision.d.ts +12 -0
- package/dist/opencode-multiagent/supervision.d.ts.map +1 -0
- package/dist/opencode-multiagent/task-manager.d.ts +48 -0
- package/dist/opencode-multiagent/task-manager.d.ts.map +1 -0
- package/dist/opencode-multiagent/telemetry.d.ts +26 -0
- package/dist/opencode-multiagent/telemetry.d.ts.map +1 -0
- package/dist/opencode-multiagent/tools.d.ts +56 -0
- package/dist/opencode-multiagent/tools.d.ts.map +1 -0
- package/dist/opencode-multiagent/types.d.ts +36 -0
- package/dist/opencode-multiagent/types.d.ts.map +1 -0
- package/dist/opencode-multiagent/utils.d.ts +9 -0
- package/dist/opencode-multiagent/utils.d.ts.map +1 -0
- package/docs/agents.md +260 -0
- package/docs/agents.tr.md +260 -0
- package/docs/configuration.md +255 -0
- package/docs/configuration.tr.md +255 -0
- package/docs/usage-guide.md +226 -0
- package/docs/usage-guide.tr.md +227 -0
- package/examples/opencode.with-overrides.json +1 -5
- package/package.json +23 -13
- package/skills/advanced-evaluation/SKILL.md +37 -21
- package/skills/advanced-evaluation/manifest.json +2 -13
- package/skills/cek-context-engineering/SKILL.md +159 -87
- package/skills/cek-context-engineering/manifest.json +1 -3
- package/skills/cek-prompt-engineering/SKILL.md +13 -10
- package/skills/cek-prompt-engineering/manifest.json +1 -3
- package/skills/cek-test-prompt/SKILL.md +38 -28
- package/skills/cek-test-prompt/manifest.json +1 -3
- package/skills/cek-thought-based-reasoning/SKILL.md +75 -21
- package/skills/cek-thought-based-reasoning/manifest.json +1 -3
- package/skills/context-degradation/SKILL.md +14 -13
- package/skills/context-degradation/manifest.json +1 -3
- package/skills/debate/SKILL.md +23 -78
- package/skills/debate/manifest.json +2 -12
- package/skills/design-first/manifest.json +2 -13
- package/skills/dispatching-parallel-agents/SKILL.md +14 -3
- package/skills/dispatching-parallel-agents/manifest.json +1 -4
- package/skills/drift-analysis/SKILL.md +50 -29
- package/skills/drift-analysis/manifest.json +2 -12
- package/skills/evaluation/manifest.json +2 -12
- package/skills/executing-plans/SKILL.md +15 -8
- package/skills/executing-plans/manifest.json +1 -3
- package/skills/handoff-protocols/manifest.json +2 -12
- package/skills/parallel-investigation/SKILL.md +25 -12
- package/skills/parallel-investigation/manifest.json +1 -4
- package/skills/reflexion-critique/SKILL.md +21 -10
- package/skills/reflexion-critique/manifest.json +1 -3
- package/skills/reflexion-reflect/SKILL.md +36 -34
- package/skills/reflexion-reflect/manifest.json +2 -10
- package/skills/root-cause-analysis/manifest.json +2 -13
- package/skills/sadd-judge-with-debate/SKILL.md +50 -26
- package/skills/sadd-judge-with-debate/manifest.json +1 -3
- package/skills/structured-code-review/manifest.json +2 -11
- package/skills/task-decomposition/manifest.json +2 -13
- package/skills/verification-before-completion/manifest.json +2 -15
- package/skills/verification-gates/SKILL.md +27 -19
- package/skills/verification-gates/manifest.json +2 -12
- package/defaults/agent-settings.json +0 -102
- package/defaults/agent-settings.schema.json +0 -25
- package/defaults/flags.json +0 -35
- package/defaults/flags.schema.json +0 -119
- package/defaults/mcp-defaults.json +0 -47
- package/defaults/mcp-defaults.schema.json +0 -38
- package/defaults/profiles.json +0 -53
- package/defaults/profiles.schema.json +0 -60
- package/defaults/team-profiles.json +0 -83
- package/src/control-plane.ts +0 -21
- package/src/index.ts +0 -8
- package/src/opencode-multiagent/compiler.ts +0 -168
- package/src/opencode-multiagent/constants.ts +0 -178
- package/src/opencode-multiagent/file-lock.ts +0 -90
- package/src/opencode-multiagent/hooks.ts +0 -599
- package/src/opencode-multiagent/log.ts +0 -12
- package/src/opencode-multiagent/mailbox.ts +0 -287
- package/src/opencode-multiagent/markdown.ts +0 -99
- package/src/opencode-multiagent/mcp.ts +0 -35
- package/src/opencode-multiagent/policy.ts +0 -67
- package/src/opencode-multiagent/quality.ts +0 -140
- package/src/opencode-multiagent/runtime.ts +0 -55
- package/src/opencode-multiagent/skills.ts +0 -144
- package/src/opencode-multiagent/supervision.ts +0 -156
- package/src/opencode-multiagent/task-manager.ts +0 -148
- package/src/opencode-multiagent/team-manager.ts +0 -219
- package/src/opencode-multiagent/team-tools.ts +0 -359
- package/src/opencode-multiagent/telemetry.ts +0 -124
- package/src/opencode-multiagent/utils.ts +0 -54
package/skills/debate/SKILL.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
name: debate
|
|
3
3
|
description: "Structured AI debate templates and synthesis. Use when orchestrating multi-round debates between AI tools, 'debate topic', 'argue about', 'stress test idea', 'devil advocate'."
|
|
4
4
|
version: 5.1.0
|
|
5
|
-
argument-hint:
|
|
5
|
+
argument-hint: '[topic] [--proposer=tool] [--challenger=tool] [--rounds=N] [--effort=level]'
|
|
6
6
|
---
|
|
7
7
|
|
|
8
8
|
# debate
|
|
@@ -12,6 +12,7 @@ Prompt templates, context assembly rules, and synthesis format for structured mu
|
|
|
12
12
|
## Arguments
|
|
13
13
|
|
|
14
14
|
Parse from `$ARGUMENTS`:
|
|
15
|
+
|
|
15
16
|
- **topic**: The debate question/topic (required)
|
|
16
17
|
- **--proposer**: Tool for the proposer role (claude, gemini, codex, opencode, copilot)
|
|
17
18
|
- **--challenger**: Tool for the challenger role (must differ from proposer)
|
|
@@ -130,6 +131,7 @@ Provide your follow-up:
|
|
|
130
131
|
Include the full text of all prior exchanges in the prompt. Context is small enough (typically under 5000 tokens total).
|
|
131
132
|
|
|
132
133
|
Format for context block:
|
|
134
|
+
|
|
133
135
|
```
|
|
134
136
|
Previous exchanges:
|
|
135
137
|
|
|
@@ -145,6 +147,7 @@ Round 1 - Challenger ({challenger_tool}):
|
|
|
145
147
|
For rounds 3 and beyond, replace full exchange text from rounds 1 through N-2 with a summary. Only include the most recent round's responses in full.
|
|
146
148
|
|
|
147
149
|
Format:
|
|
150
|
+
|
|
148
151
|
```
|
|
149
152
|
Summary of rounds 1-{N-2}:
|
|
150
153
|
{summary of key positions, agreements, and open disagreements}
|
|
@@ -157,6 +160,7 @@ Round {N-1} - Challenger ({challenger_tool}):
|
|
|
157
160
|
```
|
|
158
161
|
|
|
159
162
|
The orchestrator agent (opus) generates the summary. Target: 500-800 tokens. MUST preserve:
|
|
163
|
+
|
|
160
164
|
- Each side's core position
|
|
161
165
|
- All concessions (verbatim quotes, not paraphrased)
|
|
162
166
|
- All evidence citations that support agreements
|
|
@@ -202,6 +206,7 @@ Rate the debate on these dimensions:
|
|
|
202
206
|
```
|
|
203
207
|
|
|
204
208
|
**Synthesis rules:**
|
|
209
|
+
|
|
205
210
|
- The verdict MUST pick a side. "Both approaches have merit" is NOT acceptable.
|
|
206
211
|
- Cite specific arguments from the debate as evidence for the verdict.
|
|
207
212
|
- The recommendation must be actionable - what should the user DO based on this debate.
|
|
@@ -215,17 +220,17 @@ Save to `{AI_STATE_DIR}/debate/last-debate.json`:
|
|
|
215
220
|
{
|
|
216
221
|
"id": "debate-{ISO timestamp}-{4 char random hex}",
|
|
217
222
|
"topic": "original topic text",
|
|
218
|
-
"proposer": {"tool": "claude", "model": "opus"},
|
|
219
|
-
"challenger": {"tool": "gemini", "model": "gemini-3.1-pro-preview"},
|
|
223
|
+
"proposer": { "tool": "claude", "model": "opus" },
|
|
224
|
+
"challenger": { "tool": "gemini", "model": "gemini-3.1-pro-preview" },
|
|
220
225
|
"effort": "high",
|
|
221
226
|
"rounds_completed": 2,
|
|
222
227
|
"max_rounds": 2,
|
|
223
228
|
"status": "completed",
|
|
224
229
|
"exchanges": [
|
|
225
|
-
{"round": 1, "role": "proposer", "tool": "claude", "response": "...", "duration_ms": 8500},
|
|
226
|
-
{"round": 1, "role": "challenger", "tool": "gemini", "response": "...", "duration_ms": 12000},
|
|
227
|
-
{"round": 2, "role": "proposer", "tool": "claude", "response": "...", "duration_ms": 9200},
|
|
228
|
-
{"round": 2, "role": "challenger", "tool": "gemini", "response": "...", "duration_ms": 11000}
|
|
230
|
+
{ "round": 1, "role": "proposer", "tool": "claude", "response": "...", "duration_ms": 8500 },
|
|
231
|
+
{ "round": 1, "role": "challenger", "tool": "gemini", "response": "...", "duration_ms": 12000 },
|
|
232
|
+
{ "round": 2, "role": "proposer", "tool": "claude", "response": "...", "duration_ms": 9200 },
|
|
233
|
+
{ "round": 2, "role": "challenger", "tool": "gemini", "response": "...", "duration_ms": 11000 }
|
|
229
234
|
],
|
|
230
235
|
"verdict": {
|
|
231
236
|
"winner": "claude",
|
|
@@ -238,79 +243,19 @@ Save to `{AI_STATE_DIR}/debate/last-debate.json`:
|
|
|
238
243
|
}
|
|
239
244
|
```
|
|
240
245
|
|
|
241
|
-
Platform state directory:
|
|
242
|
-
- Claude Code: `.claude/`
|
|
243
|
-
- OpenCode: `.opencode/`
|
|
244
|
-
- Codex CLI: `.codex/`
|
|
246
|
+
Platform state directory: `.opencode/`
|
|
245
247
|
|
|
246
248
|
## Error Handling
|
|
247
249
|
|
|
248
|
-
| Error
|
|
249
|
-
|
|
250
|
-
| Proposer fails round 1
|
|
251
|
-
| Challenger fails round 1
|
|
252
|
-
| Any tool fails mid-debate
|
|
253
|
-
| Tool invocation timeout (>240s)
|
|
254
|
-
| Consult result envelope indicates failure (status/exit/error/empty output) | Treat as tool failure for that role/round and apply the same role+round policy above.
|
|
255
|
-
| Structured parse fails after successful envelope
|
|
256
|
-
| All rounds timeout
|
|
257
|
-
| No successful exchanges recorded (non-timeout)
|
|
258
|
-
|
|
259
|
-
## External Tool Quick Reference
|
|
260
|
-
|
|
261
|
-
> Canonical source: `plugins/consult/skills/consult/SKILL.md`. Build and execute CLI commands directly using these templates. Do NOT invoke via `Skill: consult` - in Claude Code that loads the interactive command wrapper and causes a recursive loop. Write the question to `{AI_STATE_DIR}/consult/question.tmp` first, then execute the command via Bash.
|
|
262
|
-
|
|
263
|
-
### Safe Command Patterns
|
|
264
|
-
|
|
265
|
-
| Provider | Safe Command Pattern |
|
|
266
|
-
|----------|---------------------|
|
|
267
|
-
| Claude | `claude -p - --output-format json --model "MODEL" --max-turns TURNS --allowedTools "Read,Glob,Grep" < "{AI_STATE_DIR}/consult/question.tmp"` |
|
|
268
|
-
| Gemini | `gemini -p - --output-format json -m "MODEL" < "{AI_STATE_DIR}/consult/question.tmp"` |
|
|
269
|
-
| Codex | `codex exec "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL" -c model_reasoning_effort="LEVEL"` |
|
|
270
|
-
| OpenCode | `opencode run - --format json --model "MODEL" --variant "VARIANT" < "{AI_STATE_DIR}/consult/question.tmp"` |
|
|
271
|
-
| Copilot | `copilot -p - < "{AI_STATE_DIR}/consult/question.tmp"` |
|
|
272
|
-
|
|
273
|
-
### Effort-to-Model Mapping
|
|
274
|
-
|
|
275
|
-
| Effort | Claude | Gemini | Codex | OpenCode | Copilot |
|
|
276
|
-
|--------|--------|--------|-------|----------|---------|
|
|
277
|
-
| low | claude-haiku-4-5 (1 turn) | gemini-3-flash-preview | gpt-5.3-codex (low) | default (low) | no control |
|
|
278
|
-
| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | gpt-5.3-codex (medium) | default (medium) | no control |
|
|
279
|
-
| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default (high) | no control |
|
|
280
|
-
| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default + --thinking | no control |
|
|
281
|
-
|
|
282
|
-
### Output Parsing
|
|
283
|
-
|
|
284
|
-
| Provider | Parse Expression |
|
|
285
|
-
|----------|-----------------|
|
|
286
|
-
| Claude | `JSON.parse(stdout).result` |
|
|
287
|
-
| Gemini | `JSON.parse(stdout).response` |
|
|
288
|
-
| Codex | `JSON.parse(stdout).message` or raw text |
|
|
289
|
-
| OpenCode | Newline-delimited JSON. Concatenate `part.text` from events where `type === "text"`. Session ID from `event.sessionID`. |
|
|
290
|
-
| Copilot | Raw stdout text |
|
|
291
|
-
|
|
292
|
-
Parse discipline:
|
|
293
|
-
1. Evaluate execution status first (timeout/non-zero/error/empty output) before any parsing.
|
|
294
|
-
2. Parse only when execution status is successful.
|
|
295
|
-
3. If parse fails, surface only sanitized parse metadata (never raw stdout/stderr snippets) and apply role/round failure policy instead of hanging or continuing silently.
|
|
296
|
-
|
|
297
|
-
### ACP Transport Commands
|
|
298
|
-
|
|
299
|
-
> ACP is an alternative transport available when providers support it. Build and execute CLI commands directly - do NOT use `Skill: consult` (recursive loop in Claude Code).
|
|
300
|
-
|
|
301
|
-
| Provider | ACP Command Pattern |
|
|
302
|
-
|----------|-------------------|
|
|
303
|
-
| Claude | `node acp/run.js --provider="claude" --question-file="{AI_STATE_DIR}/consult/question.tmp" --timeout=240000 --model="MODEL"` |
|
|
304
|
-
| Gemini | `node acp/run.js --provider="gemini" --question-file="{AI_STATE_DIR}/consult/question.tmp" --timeout=240000 --model="MODEL"` |
|
|
305
|
-
| Codex | `node acp/run.js --provider="codex" --question-file="{AI_STATE_DIR}/consult/question.tmp" --timeout=240000 --model="MODEL"` |
|
|
306
|
-
| OpenCode | `node acp/run.js --provider="opencode" --question-file="{AI_STATE_DIR}/consult/question.tmp" --timeout=240000 --model="MODEL"` |
|
|
307
|
-
| Copilot | `node acp/run.js --provider="copilot" --question-file="{AI_STATE_DIR}/consult/question.tmp" --timeout=240000` |
|
|
308
|
-
| Kiro | `node acp/run.js --provider="kiro" --question-file="{AI_STATE_DIR}/consult/question.tmp" --timeout=240000` |
|
|
309
|
-
|
|
310
|
-
Note the 240000ms timeout (240s) for debate rounds vs 120000ms (120s) for consult.
|
|
311
|
-
|
|
312
|
-
**Kiro**: ACP-only provider. No CLI mode. Available when `kiro-cli` is on PATH.
|
|
250
|
+
| Error | Action |
|
|
251
|
+
| -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
252
|
+
| Proposer fails round 1 | Abort debate. Cannot proceed without opening position. |
|
|
253
|
+
| Challenger fails round 1 | Show proposer's position with note: "[WARN] Challenger failed. Showing proposer's uncontested position." |
|
|
254
|
+
| Any tool fails mid-debate | Synthesize from completed rounds. Note incomplete round in output. |
|
|
255
|
+
| Tool invocation timeout (>240s) | Round 1 proposer: abort. Round 1 challenger: proceed with uncontested. Round 2+: synthesize from completed rounds with timeout note. |
|
|
256
|
+
| Consult result envelope indicates failure (status/exit/error/empty output) | Treat as tool failure for that role/round and apply the same role+round policy above. |
|
|
257
|
+
| Structured parse fails after successful envelope | Treat as tool failure for that role/round, include only sanitized parse metadata (`PARSE_ERROR:<type>:<code>`, redact secrets, strip control chars, max 200 chars), then apply the same role+round policy above. |
|
|
258
|
+
| All rounds timeout | "[ERROR] Debate failed: all tool invocations timed out." |
|
|
259
|
+
| No successful exchanges recorded (non-timeout) | "[ERROR] Debate failed: no successful exchanges were recorded." |
|
|
313
260
|
|
|
314
|
-
### ACP Output Parsing
|
|
315
261
|
|
|
316
|
-
ACP transport output is parsed identically to CLI transport - the ACP runner (`acp/run.js`) normalizes responses into the same JSON envelope format. The `transport` field in the envelope indicates `"acp"` or `"cli"`.
|
|
@@ -2,18 +2,8 @@
|
|
|
2
2
|
"name": "debate",
|
|
3
3
|
"version": "1.0.0",
|
|
4
4
|
"description": "Structured debate templates for stress-testing positions and solutions",
|
|
5
|
-
"triggers": [
|
|
6
|
-
|
|
7
|
-
"argue",
|
|
8
|
-
"stress test",
|
|
9
|
-
"devils advocate",
|
|
10
|
-
"counter argument"
|
|
11
|
-
],
|
|
12
|
-
"applicable_agents": [
|
|
13
|
-
"critic",
|
|
14
|
-
"strategist",
|
|
15
|
-
"librarian"
|
|
16
|
-
],
|
|
5
|
+
"triggers": ["debate", "argue", "stress test", "devils advocate", "counter argument"],
|
|
6
|
+
"applicable_agents": ["critic", "strategist", "librarian"],
|
|
17
7
|
"max_context_tokens": 2200,
|
|
18
8
|
"entry_file": "SKILL.md"
|
|
19
9
|
}
|
|
@@ -2,19 +2,8 @@
|
|
|
2
2
|
"name": "design-first",
|
|
3
3
|
"version": "1.0.0",
|
|
4
4
|
"description": "Design the approach before implementation starts",
|
|
5
|
-
"triggers": [
|
|
6
|
-
|
|
7
|
-
"architecture",
|
|
8
|
-
"plan",
|
|
9
|
-
"schema",
|
|
10
|
-
"api",
|
|
11
|
-
"interface"
|
|
12
|
-
],
|
|
13
|
-
"applicable_agents": [
|
|
14
|
-
"planner",
|
|
15
|
-
"worker",
|
|
16
|
-
"heavy-worker"
|
|
17
|
-
],
|
|
5
|
+
"triggers": ["design", "architecture", "plan", "schema", "api", "interface"],
|
|
6
|
+
"applicable_agents": ["planner", "strategist", "heavy-worker", "deep-worker", "advisor"],
|
|
18
7
|
"max_context_tokens": 1500,
|
|
19
8
|
"entry_file": "SKILL.md"
|
|
20
9
|
}
|
|
@@ -32,12 +32,14 @@ digraph when_to_use {
|
|
|
32
32
|
```
|
|
33
33
|
|
|
34
34
|
**Use when:**
|
|
35
|
+
|
|
35
36
|
- 3+ test files failing with different root causes
|
|
36
37
|
- Multiple subsystems broken independently
|
|
37
38
|
- Each problem can be understood without context from others
|
|
38
39
|
- No shared state between investigations
|
|
39
40
|
|
|
40
41
|
**Don't use when:**
|
|
42
|
+
|
|
41
43
|
- Failures are related (fix one might fix others)
|
|
42
44
|
- Need to understand full system state
|
|
43
45
|
- Agents would interfere with each other
|
|
@@ -47,6 +49,7 @@ digraph when_to_use {
|
|
|
47
49
|
### 1. Identify Independent Domains
|
|
48
50
|
|
|
49
51
|
Group failures by what's broken:
|
|
52
|
+
|
|
50
53
|
- File A tests: Tool approval flow
|
|
51
54
|
- File B tests: Batch completion behavior
|
|
52
55
|
- File C tests: Abort functionality
|
|
@@ -56,6 +59,7 @@ Each domain is independent - fixing tool approval doesn't affect abort tests.
|
|
|
56
59
|
### 2. Create Focused Agent Tasks
|
|
57
60
|
|
|
58
61
|
Each agent gets:
|
|
62
|
+
|
|
59
63
|
- **Specific scope:** One test file or subsystem
|
|
60
64
|
- **Clear goal:** Make these tests pass
|
|
61
65
|
- **Constraints:** Don't change other code
|
|
@@ -65,15 +69,16 @@ Each agent gets:
|
|
|
65
69
|
|
|
66
70
|
```typescript
|
|
67
71
|
// In Claude Code / AI environment
|
|
68
|
-
Task(
|
|
69
|
-
Task(
|
|
70
|
-
Task(
|
|
72
|
+
Task('Fix agent-tool-abort.test.ts failures');
|
|
73
|
+
Task('Fix batch-completion-behavior.test.ts failures');
|
|
74
|
+
Task('Fix tool-approval-race-conditions.test.ts failures');
|
|
71
75
|
// All three run concurrently
|
|
72
76
|
```
|
|
73
77
|
|
|
74
78
|
### 4. Review and Integrate
|
|
75
79
|
|
|
76
80
|
When agents return:
|
|
81
|
+
|
|
77
82
|
- Read each summary
|
|
78
83
|
- Verify fixes don't conflict
|
|
79
84
|
- Run full test suite
|
|
@@ -82,6 +87,7 @@ When agents return:
|
|
|
82
87
|
## Agent Prompt Structure
|
|
83
88
|
|
|
84
89
|
Good agent prompts are:
|
|
90
|
+
|
|
85
91
|
1. **Focused** - One clear problem domain
|
|
86
92
|
2. **Self-contained** - All context needed to understand the problem
|
|
87
93
|
3. **Specific about output** - What should the agent return?
|
|
@@ -133,6 +139,7 @@ Return: Summary of what you found and what you fixed.
|
|
|
133
139
|
**Scenario:** 6 test failures across 3 files after major refactoring
|
|
134
140
|
|
|
135
141
|
**Failures:**
|
|
142
|
+
|
|
136
143
|
- agent-tool-abort.test.ts: 3 failures (timing issues)
|
|
137
144
|
- batch-completion-behavior.test.ts: 2 failures (tools not executing)
|
|
138
145
|
- tool-approval-race-conditions.test.ts: 1 failure (execution count = 0)
|
|
@@ -140,6 +147,7 @@ Return: Summary of what you found and what you fixed.
|
|
|
140
147
|
**Decision:** Independent domains - abort logic separate from batch completion separate from race conditions
|
|
141
148
|
|
|
142
149
|
**Dispatch:**
|
|
150
|
+
|
|
143
151
|
```
|
|
144
152
|
Agent 1 → Fix agent-tool-abort.test.ts
|
|
145
153
|
Agent 2 → Fix batch-completion-behavior.test.ts
|
|
@@ -147,6 +155,7 @@ Agent 3 → Fix tool-approval-race-conditions.test.ts
|
|
|
147
155
|
```
|
|
148
156
|
|
|
149
157
|
**Results:**
|
|
158
|
+
|
|
150
159
|
- Agent 1: Replaced timeouts with event-based waiting
|
|
151
160
|
- Agent 2: Fixed event structure bug (threadId in wrong place)
|
|
152
161
|
- Agent 3: Added wait for async tool execution to complete
|
|
@@ -165,6 +174,7 @@ Agent 3 → Fix tool-approval-race-conditions.test.ts
|
|
|
165
174
|
## Verification
|
|
166
175
|
|
|
167
176
|
After agents return:
|
|
177
|
+
|
|
168
178
|
1. **Review each summary** - Understand what changed
|
|
169
179
|
2. **Check for conflicts** - Did agents edit same code?
|
|
170
180
|
3. **Run full suite** - Verify all fixes work together
|
|
@@ -173,6 +183,7 @@ After agents return:
|
|
|
173
183
|
## Real-World Impact
|
|
174
184
|
|
|
175
185
|
From debugging session (2025-10-03):
|
|
186
|
+
|
|
176
187
|
- 6 failures across 3 files
|
|
177
188
|
- 3 agents dispatched in parallel
|
|
178
189
|
- All investigations completed concurrently
|
|
@@ -30,21 +30,25 @@ Knowledge and patterns for analyzing project state, detecting plan drift, and cr
|
|
|
30
30
|
### Types of Drift
|
|
31
31
|
|
|
32
32
|
**Plan Drift**: When documented plans diverge from actual implementation
|
|
33
|
+
|
|
33
34
|
- PLAN.md items remain unchecked for extended periods
|
|
34
35
|
- Roadmap milestones slip without updates
|
|
35
36
|
- Sprint/phase goals not reflected in code changes
|
|
36
37
|
|
|
37
38
|
**Documentation Drift**: When documentation falls behind implementation
|
|
39
|
+
|
|
38
40
|
- New features exist without corresponding docs
|
|
39
41
|
- README describes features that don't exist
|
|
40
42
|
- API docs don't match actual endpoints
|
|
41
43
|
|
|
42
44
|
**Issue Drift**: When issue tracking diverges from reality
|
|
45
|
+
|
|
43
46
|
- Stale issues that no longer apply
|
|
44
47
|
- Completed work without closed issues
|
|
45
48
|
- High-priority items neglected
|
|
46
49
|
|
|
47
50
|
**Scope Drift**: When project scope expands beyond original plans
|
|
51
|
+
|
|
48
52
|
- More features documented than can be delivered
|
|
49
53
|
- Continuous addition without completion
|
|
50
54
|
- Ever-growing backlog with no pruning
|
|
@@ -83,17 +87,17 @@ function calculatePriority(item, weights) {
|
|
|
83
87
|
critical: 15,
|
|
84
88
|
high: 10,
|
|
85
89
|
medium: 5,
|
|
86
|
-
low: 2
|
|
90
|
+
low: 2,
|
|
87
91
|
};
|
|
88
92
|
score += severityScores[item.severity] || 5;
|
|
89
93
|
|
|
90
94
|
// Category multiplier
|
|
91
95
|
const categoryWeights = {
|
|
92
|
-
security: 2.0,
|
|
93
|
-
bugs: 1.5,
|
|
96
|
+
security: 2.0, // Security issues get 2x
|
|
97
|
+
bugs: 1.5, // Bugs get 1.5x
|
|
94
98
|
infrastructure: 1.3,
|
|
95
99
|
features: 1.0,
|
|
96
|
-
documentation: 0.8
|
|
100
|
+
documentation: 0.8,
|
|
97
101
|
};
|
|
98
102
|
score *= categoryWeights[item.category] || 1.0;
|
|
99
103
|
|
|
@@ -109,21 +113,21 @@ function calculatePriority(item, weights) {
|
|
|
109
113
|
|
|
110
114
|
### Time Bucket Thresholds
|
|
111
115
|
|
|
112
|
-
| Bucket
|
|
113
|
-
|
|
114
|
-
| Immediate
|
|
115
|
-
| Short-term
|
|
116
|
-
| Medium-term | priority >= 5
|
|
117
|
-
| Backlog
|
|
116
|
+
| Bucket | Criteria | Max Items |
|
|
117
|
+
| ----------- | ----------------------------------- | --------- |
|
|
118
|
+
| Immediate | severity=critical OR priority >= 15 | 5 |
|
|
119
|
+
| Short-term | severity=high OR priority >= 10 | 10 |
|
|
120
|
+
| Medium-term | priority >= 5 | 15 |
|
|
121
|
+
| Backlog | everything else | 20 |
|
|
118
122
|
|
|
119
123
|
### Priority Weights (Default)
|
|
120
124
|
|
|
121
125
|
```yaml
|
|
122
|
-
security: 10
|
|
123
|
-
bugs: 8
|
|
124
|
-
features: 5
|
|
126
|
+
security: 10 # Security issues always top priority
|
|
127
|
+
bugs: 8 # Bugs affect users directly
|
|
128
|
+
features: 5 # New functionality
|
|
125
129
|
documentation: 3 # Important but not urgent
|
|
126
|
-
tech-debt: 4
|
|
130
|
+
tech-debt: 4 # Keeps codebase healthy
|
|
127
131
|
```
|
|
128
132
|
|
|
129
133
|
## Cross-Reference Patterns
|
|
@@ -133,29 +137,32 @@ tech-debt: 4 # Keeps codebase healthy
|
|
|
133
137
|
```javascript
|
|
134
138
|
// Fuzzy matching for feature names
|
|
135
139
|
function featureMatch(docFeature, codeFeature) {
|
|
136
|
-
const normalize = s =>
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
+
const normalize = (s) =>
|
|
141
|
+
s
|
|
142
|
+
.toLowerCase()
|
|
143
|
+
.replace(/[-_\s]+/g, '')
|
|
144
|
+
.replace(/s$/, ''); // Remove trailing 's'
|
|
140
145
|
|
|
141
146
|
const docNorm = normalize(docFeature);
|
|
142
147
|
const codeNorm = normalize(codeFeature);
|
|
143
148
|
|
|
144
|
-
return
|
|
145
|
-
|
|
146
|
-
|
|
149
|
+
return (
|
|
150
|
+
docNorm.includes(codeNorm) ||
|
|
151
|
+
codeNorm.includes(docNorm) ||
|
|
152
|
+
levenshteinDistance(docNorm, codeNorm) < 3
|
|
153
|
+
);
|
|
147
154
|
}
|
|
148
155
|
```
|
|
149
156
|
|
|
150
157
|
### Common Mismatches
|
|
151
158
|
|
|
152
|
-
| Documented As
|
|
153
|
-
|
|
154
|
-
| "user authentication" | auth/, login/, session/
|
|
155
|
-
| "API endpoints"
|
|
156
|
-
| "database models"
|
|
157
|
-
| "caching layer"
|
|
158
|
-
| "logging system"
|
|
159
|
+
| Documented As | Implemented As |
|
|
160
|
+
| --------------------- | ---------------------------- |
|
|
161
|
+
| "user authentication" | auth/, login/, session/ |
|
|
162
|
+
| "API endpoints" | routes/, api/, handlers/ |
|
|
163
|
+
| "database models" | models/, entities/, schemas/ |
|
|
164
|
+
| "caching layer" | cache/, redis/, memcache/ |
|
|
165
|
+
| "logging system" | logger/, logs/, telemetry/ |
|
|
159
166
|
|
|
160
167
|
## Output Templates
|
|
161
168
|
|
|
@@ -165,6 +172,7 @@ function featureMatch(docFeature, codeFeature) {
|
|
|
165
172
|
## Drift Analysis
|
|
166
173
|
|
|
167
174
|
### {drift_type}
|
|
175
|
+
|
|
168
176
|
**Severity**: {severity}
|
|
169
177
|
**Detected In**: {source}
|
|
170
178
|
|
|
@@ -189,6 +197,7 @@ function featureMatch(docFeature, codeFeature) {
|
|
|
189
197
|
**Impact**: {impact_description}
|
|
190
198
|
|
|
191
199
|
**To Address**:
|
|
200
|
+
|
|
192
201
|
1. {action_item_1}
|
|
193
202
|
2. {action_item_2}
|
|
194
203
|
```
|
|
@@ -199,15 +208,19 @@ function featureMatch(docFeature, codeFeature) {
|
|
|
199
208
|
## Reconstruction Plan
|
|
200
209
|
|
|
201
210
|
### Immediate Actions (This Week)
|
|
211
|
+
|
|
202
212
|
{immediate_items_numbered}
|
|
203
213
|
|
|
204
214
|
### Short-Term (This Month)
|
|
215
|
+
|
|
205
216
|
{short_term_items_numbered}
|
|
206
217
|
|
|
207
218
|
### Medium-Term (This Quarter)
|
|
219
|
+
|
|
208
220
|
{medium_term_items_numbered}
|
|
209
221
|
|
|
210
222
|
### Backlog
|
|
223
|
+
|
|
211
224
|
{backlog_items_numbered}
|
|
212
225
|
```
|
|
213
226
|
|
|
@@ -253,6 +266,7 @@ function featureMatch(docFeature, codeFeature) {
|
|
|
253
266
|
The collectors.js module extracts data without LLM overhead:
|
|
254
267
|
|
|
255
268
|
### GitHub Data
|
|
269
|
+
|
|
256
270
|
- Open issues categorized by labels
|
|
257
271
|
- Open PRs with draft status
|
|
258
272
|
- Milestones with due dates
|
|
@@ -260,12 +274,14 @@ The collectors.js module extracts data without LLM overhead:
|
|
|
260
274
|
- Theme analysis from titles
|
|
261
275
|
|
|
262
276
|
### Documentation Data
|
|
263
|
-
|
|
277
|
+
|
|
278
|
+
- Parsed README, PLAN.md, AGENTS.md, CHANGELOG.md
|
|
264
279
|
- Checkbox completion counts
|
|
265
280
|
- Section analysis
|
|
266
281
|
- Feature lists
|
|
267
282
|
|
|
268
283
|
### Code Data
|
|
284
|
+
|
|
269
285
|
- Directory structure
|
|
270
286
|
- Framework detection
|
|
271
287
|
- Test framework presence
|
|
@@ -309,16 +325,21 @@ The plan-synthesizer receives all collected data and performs:
|
|
|
309
325
|
# Reality Check Report
|
|
310
326
|
|
|
311
327
|
## Executive Summary
|
|
328
|
+
|
|
312
329
|
Project has moderate drift: 8 stale priority issues and 20% plan completion.
|
|
313
330
|
Strong code health (tests + CI) but documentation lags implementation.
|
|
314
331
|
|
|
315
332
|
## Drift Analysis
|
|
333
|
+
|
|
316
334
|
### Priority Neglect
|
|
335
|
+
|
|
317
336
|
**Severity**: high
|
|
318
337
|
8 high-priority issues inactive for 60+ days...
|
|
319
338
|
|
|
320
339
|
## Prioritized Plan
|
|
340
|
+
|
|
321
341
|
### Immediate
|
|
342
|
+
|
|
322
343
|
1. Close #45 (already implemented)
|
|
323
344
|
2. Update README API section...
|
|
324
345
|
```
|
|
@@ -2,18 +2,8 @@
|
|
|
2
2
|
"name": "drift-analysis",
|
|
3
3
|
"version": "1.0.0",
|
|
4
4
|
"description": "Analyze plan drift between documented intent and repository reality",
|
|
5
|
-
"triggers": [
|
|
6
|
-
|
|
7
|
-
"reality check",
|
|
8
|
-
"plan drift",
|
|
9
|
-
"implementation gap",
|
|
10
|
-
"roadmap alignment"
|
|
11
|
-
],
|
|
12
|
-
"applicable_agents": [
|
|
13
|
-
"planner",
|
|
14
|
-
"auditor",
|
|
15
|
-
"strategist"
|
|
16
|
-
],
|
|
5
|
+
"triggers": ["drift", "reality check", "plan drift", "implementation gap", "roadmap alignment"],
|
|
6
|
+
"applicable_agents": ["planner", "auditor", "strategist"],
|
|
17
7
|
"max_context_tokens": 2000,
|
|
18
8
|
"entry_file": "SKILL.md"
|
|
19
9
|
}
|
|
@@ -2,18 +2,8 @@
|
|
|
2
2
|
"name": "evaluation",
|
|
3
3
|
"version": "1.0.0",
|
|
4
4
|
"description": "Lightweight evaluation framework for judgments and comparisons",
|
|
5
|
-
"triggers": [
|
|
6
|
-
|
|
7
|
-
"assess",
|
|
8
|
-
"judge",
|
|
9
|
-
"compare",
|
|
10
|
-
"rate"
|
|
11
|
-
],
|
|
12
|
-
"applicable_agents": [
|
|
13
|
-
"reviewer",
|
|
14
|
-
"critic",
|
|
15
|
-
"advisor"
|
|
16
|
-
],
|
|
5
|
+
"triggers": ["evaluate", "assess", "judge", "compare", "rate"],
|
|
6
|
+
"applicable_agents": ["reviewer", "librarian"],
|
|
17
7
|
"max_context_tokens": 1500,
|
|
18
8
|
"entry_file": "SKILL.md"
|
|
19
9
|
}
|
|
@@ -11,11 +11,12 @@ Load plan, review critically, execute all tasks, report when complete.
|
|
|
11
11
|
|
|
12
12
|
**Announce at start:** "I'm using the executing-plans skill to implement this plan."
|
|
13
13
|
|
|
14
|
-
**Note:**
|
|
14
|
+
**Note:** This skill works best on platforms with subagent support (such as OpenCode). If subagents are available, prefer using them for parallel task execution.
|
|
15
15
|
|
|
16
16
|
## The Process
|
|
17
17
|
|
|
18
18
|
### Step 1: Load and Review Plan
|
|
19
|
+
|
|
19
20
|
1. Read plan file
|
|
20
21
|
2. Review critically - identify any questions or concerns about the plan
|
|
21
22
|
3. If concerns: Raise them with your human partner before starting
|
|
@@ -24,6 +25,7 @@ Load plan, review critically, execute all tasks, report when complete.
|
|
|
24
25
|
### Step 2: Execute Tasks
|
|
25
26
|
|
|
26
27
|
For each task:
|
|
28
|
+
|
|
27
29
|
1. Mark as in_progress
|
|
28
30
|
2. Follow each step exactly (plan has bite-sized steps)
|
|
29
31
|
3. Run verifications as specified
|
|
@@ -32,13 +34,15 @@ For each task:
|
|
|
32
34
|
### Step 3: Complete Development
|
|
33
35
|
|
|
34
36
|
After all tasks complete and verified:
|
|
35
|
-
|
|
36
|
-
-
|
|
37
|
-
-
|
|
37
|
+
|
|
38
|
+
- Verify all tests pass and the build succeeds
|
|
39
|
+
- Present completion options to the user (merge, PR, etc.)
|
|
40
|
+
- Execute the user's chosen completion strategy
|
|
38
41
|
|
|
39
42
|
## When to Stop and Ask for Help
|
|
40
43
|
|
|
41
44
|
**STOP executing immediately when:**
|
|
45
|
+
|
|
42
46
|
- Hit a blocker (missing dependency, test fails, instruction unclear)
|
|
43
47
|
- Plan has critical gaps preventing starting
|
|
44
48
|
- You don't understand an instruction
|
|
@@ -49,12 +53,14 @@ After all tasks complete and verified:
|
|
|
49
53
|
## When to Revisit Earlier Steps
|
|
50
54
|
|
|
51
55
|
**Return to Review (Step 1) when:**
|
|
56
|
+
|
|
52
57
|
- Partner updates the plan based on your feedback
|
|
53
58
|
- Fundamental approach needs rethinking
|
|
54
59
|
|
|
55
60
|
**Don't force through blockers** - stop and ask.
|
|
56
61
|
|
|
57
62
|
## Remember
|
|
63
|
+
|
|
58
64
|
- Review plan critically first
|
|
59
65
|
- Follow plan steps exactly
|
|
60
66
|
- Don't skip verifications
|
|
@@ -64,7 +70,8 @@ After all tasks complete and verified:
|
|
|
64
70
|
|
|
65
71
|
## Integration
|
|
66
72
|
|
|
67
|
-
**
|
|
68
|
-
|
|
69
|
-
-
|
|
70
|
-
-
|
|
73
|
+
**Recommended workflow practices:**
|
|
74
|
+
|
|
75
|
+
- Set up an isolated workspace (e.g., git worktree or feature branch) before starting
|
|
76
|
+
- Have a written plan ready before executing (this skill executes plans, not creates them)
|
|
77
|
+
- Follow a structured completion process after all tasks are done (verify, review, merge)
|