agestra 4.13.2 → 4.13.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +10 -4
- package/.claude-plugin/plugin.json +2 -2
- package/README.ja.md +102 -59
- package/README.ko.md +74 -20
- package/README.md +70 -16
- package/README.zh.md +105 -62
- package/agents/agestra-moderator.md +8 -8
- package/agents/agestra-team-lead.md +90 -35
- package/commands/design.md +1 -1
- package/commands/idea.md +2 -2
- package/commands/implement.md +5 -4
- package/commands/qa.md +3 -2
- package/commands/review.md +4 -2
- package/commands/security.md +2 -1
- package/dist/bundle.js +189 -180
- package/hooks/user-prompt-submit.js +2 -2
- package/package.json +9 -3
- package/skills/design.md +1 -1
- package/skills/idea.md +1 -1
- package/skills/provider-guide.md +8 -8
- package/skills/review.md +1 -1
|
@@ -130,19 +130,19 @@ Before executing, gather context:
|
|
|
130
130
|
|
|
131
131
|
1. Call `environment_check` to get the full capability map:
|
|
132
132
|
- Which CLI tools are installed (codex, gemini, tmux)
|
|
133
|
-
- Which
|
|
133
|
+
- Which frontier and local models are available, including Ollama model tiers when present
|
|
134
134
|
- Whether autonomous work is possible (CLI workers + git worktree)
|
|
135
135
|
- Available modes: leader-host-only (`claude_only` or `leader_only` from legacy environment output), independent, debate, team
|
|
136
136
|
2. Call `provider_list` for provider availability.
|
|
137
|
-
3. Call `trace_summary` to
|
|
138
|
-
-
|
|
139
|
-
-
|
|
140
|
-
-
|
|
137
|
+
3. Call `trace_summary` to check whether prior provider quality observations exist.
|
|
138
|
+
- Treat trace quality as optional evidence, not guaranteed knowledge.
|
|
139
|
+
- If present, use it as a tie-breaker between otherwise suitable providers.
|
|
140
|
+
- If absent, do not invent quality history; route by detected model capability, task risk, and execution policy.
|
|
141
141
|
4. Read existing design documents in `docs/plans/`.
|
|
142
142
|
5. Store environment capabilities for later mode selection:
|
|
143
143
|
- `can_autonomous_work`: CLI workers available?
|
|
144
144
|
- `available_providers`: which are online?
|
|
145
|
-
- `
|
|
145
|
+
- `model_capabilities`: detected frontier/local model capability and tier classifications
|
|
146
146
|
6. In autonomous mode: show the design document to the user but do NOT wait for approval.
|
|
147
147
|
|
|
148
148
|
### Phase 2: Task Design
|
|
@@ -156,7 +156,7 @@ Decompose the work into independent, assignable tasks:
|
|
|
156
156
|
| Option | Description |
|
|
157
157
|
|--------|-------------|
|
|
158
158
|
| **Leader-host only** | The current host uses `agestra-implementer` and specialist agents/prompts; no external coding workers. QA routing still follows the configured-provider default unless host-only QA is requested |
|
|
159
|
-
| **Multi-AI** | CLI AIs work autonomously when suitable,
|
|
159
|
+
| **Multi-AI** | Work is distributed according to detected model capability, including frontier and local models. CLI AIs work autonomously when suitable, local/tool models may handle scoped read/write work when their execution policy allows it, and host-local agents handle implementation/review/QA that should stay on the leader host |
|
|
160
160
|
|
|
161
161
|
If no external providers available: skip selection, proceed with Leader-host only.
|
|
162
162
|
In autonomous mode: auto-select based on task complexity:
|
|
@@ -195,14 +195,14 @@ Decompose the work into independent, assignable tasks:
|
|
|
195
195
|
|
|
196
196
|
| Task Characteristics | Route To |
|
|
197
197
|
|---------------------|----------|
|
|
198
|
-
| Complex implementation, multi-step reasoning | MCP: `cli_worker_spawn` (Codex/Gemini) |
|
|
199
|
-
| Simple transforms, formatting, repeated pattern application | MCP: `ai_chat`
|
|
198
|
+
| Complex implementation, multi-step reasoning | High-capability detected model, usually MCP: `cli_worker_spawn` (Codex/Gemini) or host-local `agestra-implementer` |
|
|
199
|
+
| Simple transforms, formatting, repeated pattern application | Capability-matched local/tool model through MCP: `ai_chat` when available. With `workspace-write` / `full-auto`, it may apply scoped file edits through AgentLoop tools; with `read-only`, it returns a patch plan or candidate diff for `agestra-implementer` to apply |
|
|
200
200
|
| Core implementation, design-sensitive changes | `agestra-implementer` or a high-capability CLI worker |
|
|
201
201
|
| Product/unit/integration test writing | `agestra-implementer` or a high-capability CLI worker |
|
|
202
202
|
| Persistent E2E test writing | `agestra-e2e-writer` after user approval |
|
|
203
203
|
| Review and QA | `agestra-reviewer` or `agestra-qa` |
|
|
204
204
|
|
|
205
|
-
**
|
|
205
|
+
**Capability-First Provider Selection:**
|
|
206
206
|
|
|
207
207
|
Before assigning any task, determine its difficulty level:
|
|
208
208
|
- **low**: Simple chat, basic formatting, straightforward review
|
|
@@ -210,11 +210,11 @@ Decompose the work into independent, assignable tasks:
|
|
|
210
210
|
- **high**: Complex architecture, cross-validation, multi-component refactoring
|
|
211
211
|
|
|
212
212
|
Then filter providers by qualification:
|
|
213
|
-
1. Check
|
|
214
|
-
2. Only assign a task to a provider
|
|
215
|
-
3.
|
|
216
|
-
4. If no provider qualifies, fall back to `agestra-implementer` for the task
|
|
217
|
-
5.
|
|
213
|
+
1. Check detected model capability, provider type, execution policy, and task risk.
|
|
214
|
+
2. Only assign a task to a provider whose detected capability qualifies for its difficulty level.
|
|
215
|
+
3. If `trace_summary` has relevant quality data, use it as a tie-breaker between otherwise qualified providers.
|
|
216
|
+
4. If no provider qualifies, fall back to `agestra-implementer` for the task.
|
|
217
|
+
5. Providers with no trace data are unknown, not bad; start them on lower-risk, tightly scoped assignments until evidence accumulates.
|
|
218
218
|
|
|
219
219
|
4. Define dependency relationships between tasks.
|
|
220
220
|
|
|
@@ -247,9 +247,12 @@ Execute approved tasks across available execution paths:
|
|
|
247
247
|
- Otherwise → re-route to a different provider or complete the task through `agestra-implementer`.
|
|
248
248
|
5. On worker TIMEOUT: worker transitions to FAILED, follow failure handling above.
|
|
249
249
|
|
|
250
|
-
**
|
|
251
|
-
- Call `ai_chat` with
|
|
252
|
-
-
|
|
250
|
+
**Local/tool-model tasks (MCP `ai_chat`, currently Ollama-backed):**
|
|
251
|
+
- Distribute work according to detected model capability, including frontier and local models. Call `ai_chat` with a capability-matched local/tool model for scoped tasks when available.
|
|
252
|
+
- Respect the provider's `executionPolicy` from `provider_list` / setup config:
|
|
253
|
+
- `read-only` → the model receives read/search tools only and must return analysis, a patch plan, or a candidate diff.
|
|
254
|
+
- `workspace-write` / `full-auto` → the model may receive read/write AgentLoop tools and can make scoped workspace edits.
|
|
255
|
+
- After any write-enabled local/tool-model task, inspect the git diff before proceeding. If the diff exceeds the assigned files, touches risky surfaces, or looks incomplete, route correction to `agestra-implementer` or a higher-capability CLI worker.
|
|
253
256
|
|
|
254
257
|
**Result Integration:**
|
|
255
258
|
- Leader-host implementation: changes are already applied on the main branch (no merge needed).
|
|
@@ -351,12 +354,59 @@ Build the QA Brigade handoff before starting the moderator debate:
|
|
|
351
354
|
| `agestra-e2e-writer` | Not a brigade reviewer. Use only after an approved `E2E_TEST_WORK_REQUEST` for persistent E2E test work |
|
|
352
355
|
|
|
353
356
|
Default participant policy:
|
|
354
|
-
- Include every configured and available review-capable provider by default, not only the "best" one. Use `trace_summary
|
|
355
|
-
-
|
|
357
|
+
- Include every configured and available review-capable provider by default, not only the "best" one. Use `trace_summary`, when populated, to assign lenses and order attention, not to shrink the brigade unless a provider is unavailable, explicitly excluded, or clearly unqualified for the requested lens.
|
|
358
|
+
- Include configured providers when their detected model capability qualifies for the assigned QA lens; use trace quality only as optional supporting evidence, and use read-only debate tools for QA/review so they do not modify source files during verification.
|
|
356
359
|
- Keep the host QA participant in the flow even when external providers are present, because executable evidence, E2E/runtime observation, and local command output are host-owned. In structured debate, this is the `claude-qa` compatibility participant when auto-injected or explicitly listed.
|
|
357
360
|
- Assign distinct lenses so the output is not three copies of the same review. Suggested lenses: spec-to-code compliance, progress-table truthfulness, integration/regression risk, edge/error states, test adequacy, safety hygiene, and E2E artifact interpretation.
|
|
358
361
|
- Each brigade member must issue an independent PASS / CONDITIONAL PASS / FAIL recommendation with evidence and confidence in its individual source material. Disagreement is useful; preserve minority reports in the final synthesis.
|
|
359
362
|
|
|
363
|
+
Scale and reliability controls:
|
|
364
|
+
- For whole-project, large-directory, or deep review/QA requests, create a bounded evidence packet before provider fan-out. Include changed files, key entry points, build/test evidence, relevant configs/docs, and explicit out-of-scope areas; do not expect every external CLI provider to discover a large repository from scratch inside one turn.
|
|
365
|
+
- Use `participant_timeout_ms` of at least `600000` for broad or deep structured debates, and raise it further only when the user accepts the wait. If participants still time out, split the task by subsystem and run multiple narrower debates.
|
|
366
|
+
- If Gemini reports a workspace trust issue, treat it as provider unavailable for that run, tell the user the project must be trusted in Gemini CLI, and continue with the remaining participants or retry after trust is fixed. Do not count trust failures as review disagreement.
|
|
367
|
+
|
|
368
|
+
#### 5M.0b Host specialist pre-injection (REQUIRED on Claude-Code host)
|
|
369
|
+
|
|
370
|
+
> Why this exists: the structured-debate engine cannot ask the Claude-Code host to call its own native subagents back through MCP — that would invert the dependency direction. Instead, when the leader host wants Claude specialist input (`claude-reviewer` / `claude-qa` / `claude-security`) inside a multi-AI debate, the leader runs the native subagent **before** starting the debate and supplies the result as a `source_documents` entry. The moderator engine then loads that document as the specialist's individual review and excludes the specialist from subsequent consensus rounds. External providers still fan out and debate normally. This is the Phase B routing contract; do not bypass it.
|
|
371
|
+
>
|
|
372
|
+
> When this applies: ANY structured debate (QA Brigade, multi-AI review, multi-AI security) on Claude-Code host that wants Claude specialist participation. If you only have external providers and no Claude specialist lens, skip this subsection entirely.
|
|
373
|
+
|
|
374
|
+
Procedure (run before `agent_debate_structured`):
|
|
375
|
+
|
|
376
|
+
1. Decide which Claude specialists belong in the brigade — typically `agestra-qa` for QA Brigade, `agestra-reviewer` for multi-AI review, `agestra-security` for multi-AI security. You may include more than one (e.g., QA Brigade may also pull in `agestra-reviewer` as a supporting lens).
|
|
377
|
+
2. For each chosen specialist, invoke it via the `Agent` tool with the same scope/files/design references the external providers will see. Wait for the result.
|
|
378
|
+
3. Persist each specialist result as a workspace document via `workspace_create_document` (kind: `individual`). Capture the returned `document_id`.
|
|
379
|
+
4. Build the `agent_debate_structured` arguments so they self-describe the pre-injection:
|
|
380
|
+
- `participants`: include the matching Claude specialist IDs (`claude-reviewer`, `claude-qa`, `claude-security`) alongside the external providers. The schema requires the `provider` field of every `source_documents` entry to be present in `participants`.
|
|
381
|
+
- `source_documents`: one entry per specialist — `{ "document_id": "<id from workspace_create_document>", "provider": "claude-reviewer" | "claude-qa" | "claude-security" }`.
|
|
382
|
+
- `auto_inject_specialists`: `false`. You already added the specialists manually; auto-inject would duplicate.
|
|
383
|
+
5. Start the debate. The moderator will load each pre-injected document as the specialist's individual review, parse its `<proposals>` block into ledger items, and skip that specialist in every consensus round. External providers fan out and debate as usual; they may vote on specialist-proposed items.
|
|
384
|
+
|
|
385
|
+
Round-loop note (Phase C — host-turn handoff): pre-injected specialists still skip consensus rounds. Specialists that were NOT pre-injected (e.g. you let `auto_inject_specialists: true` add a Claude reviewer mid-debate, or you intentionally listed `claude-reviewer` in `participants` *without* a matching `source_documents` entry) now participate every round through the host-turn handoff protocol described in 5M.0c. Use 5M.0b when you only need a single up-front specialist read; use 5M.0c on top of it when you also want round-by-round specialist stances.
|
|
386
|
+
|
|
387
|
+
Multi-host note: on hosts that do not have a Claude specialist surface (Codex, Gemini, Ollama-only configurations), do not synthesize fake specialist documents. Either run the debate without Claude specialists or ask the user to add a host that exposes them.
|
|
388
|
+
|
|
389
|
+
#### 5M.0c Round-time specialist host-turn handoff (Phase C)
|
|
390
|
+
|
|
391
|
+
> Why this exists: when a Claude specialist participates in consensus rounds (rather than only contributing an upfront review), the moderator engine cannot call back into the host's native subagent. The engine instead parks the round, advertises the pending request through the persisted snapshot, and waits for the leader to dispatch the subagent and post the result. This keeps the dependency direction MCP-compliant on every host.
|
|
392
|
+
|
|
393
|
+
When this engages: any structured debate where a Claude specialist (`claude-reviewer` / `claude-qa` / `claude-security`) is in `participants`, the leader host is Claude-Code, and the participant is NOT covered by a `source_documents` entry. The engine activates the handoff per round — independently of 5M.0b pre-injection.
|
|
394
|
+
|
|
395
|
+
Polling loop (run alongside the existing 5M.2 polling):
|
|
396
|
+
|
|
397
|
+
1. Call `agent_debate_status` until the snapshot reports `phase: awaiting-host-turn`. The structured response then carries a `pending_host_turns[]` array with one entry per specialist that owes a turn this round. Each entry exposes `participant_id`, `agent_name` (e.g. `agestra-reviewer`), `round`, `prompt`, optional `system_prompt`, optional `files`, and `requested_at`.
|
|
398
|
+
2. For every entry, invoke the matching native subagent via the `Agent` tool with `subagent_type: <agent_name>`. Pass the entry's `prompt` verbatim — do not paraphrase, do not strip the JSON contract. Forward `files` if listed. The subagent must return its `<consensus_turn>` JSON exactly the way an external provider would.
|
|
399
|
+
3. Post the verbatim subagent text back via `agent_debate_submit_turn` with `{ session_id, participant_id, content: <subagent text>, round }`. The tool acknowledges with the count of remaining pending turns.
|
|
400
|
+
4. After every pending entry is submitted, the moderator round resumes automatically. The next `agent_debate_status` poll will report `phase: consensus-round` (or the next gate), and the round transcript will list the specialist's stance just like an external provider's.
|
|
401
|
+
|
|
402
|
+
Failure handling:
|
|
403
|
+
- Submit each specialist's response separately. If the subagent fails or refuses, do NOT submit a fabricated turn; let the gate time out so the moderator records the missing vote with `failureReason: "host-turn timeout"`. Re-running the specialist and submitting the next round is allowed.
|
|
404
|
+
- Round-mismatched submissions (`round` not equal to the pending entry's round) are rejected; align on the latest snapshot before re-trying.
|
|
405
|
+
- Duplicate submissions for the same participant in the same gate are rejected. If you need to revise a stance, wait for the next round.
|
|
406
|
+
- The handoff timeout defaults to `STRUCTURED_DEBATE_HOST_TURN_TIMEOUT_MS` (20 minutes). Tests override it through engine deps; production debates inherit the default.
|
|
407
|
+
|
|
408
|
+
Multi-host note: this handoff path is only meaningful when the leader host can natively invoke the specialist subagents (Claude-Code today). Other hosts continue to rely on 5M.0b pre-injection or omit the specialists entirely.
|
|
409
|
+
|
|
360
410
|
#### 5M.0a QA mapping onto the existing JSON ledger
|
|
361
411
|
|
|
362
412
|
Do not invent a separate QA adjudication schema. Use the moderator's existing structured-debate contract.
|
|
@@ -379,7 +429,7 @@ Ledger interpretation:
|
|
|
379
429
|
- The moderator handles duplicate/merge/superseded state in the ledger. Participants may point out duplication in comments or propose a `revise`, but they do not manually merge markdown.
|
|
380
430
|
- The leader does not decide item inclusion by hand. The leader inspects the JSON ledger and chooses approve / continue / reject at the approval gate.
|
|
381
431
|
|
|
382
|
-
Run the structured-debate MCP flow. This is a **background lifecycle**: `agent_debate_structured` creates a durable session record immediately and returns `status: running`; the leader polls `agent_debate_status` until the moderator parks the session in `ready-for-approval`, `escalated`, or `error`. The moderator does NOT write the synthesis file on its own —
|
|
432
|
+
Run the structured-debate MCP flow. This is a **background lifecycle**: `agent_debate_structured` creates a durable session record immediately and returns `status: running`; the leader polls `agent_debate_status` until the moderator parks the session in `ready-for-approval`, `escalated`, or `error`. The moderator does NOT write the synthesis file on its own — leader finalization must be explicit.
|
|
383
433
|
|
|
384
434
|
#### 5M.1 Start the debate
|
|
385
435
|
|
|
@@ -388,12 +438,13 @@ Call `agent_debate_structured` with:
|
|
|
388
438
|
- `topic` — short slug (used in file names under `.agestra/workspace/`), prefixed or framed as QA Brigade when useful.
|
|
389
439
|
- `mode` — `"review"` for QA/review/security consensus, `"idea"` for exploratory design or option discovery.
|
|
390
440
|
- `scope` — concrete framing: file list, task description, design doc path, changed files, and host QA report/evidence path.
|
|
391
|
-
- `participants` — the provider/agent IDs the user specified, or all configured and available review-capable providers from `provider_list`, plus the host QA participant (`claude-qa` compatibility ID) through auto-injection or explicit listing. For QA, use
|
|
392
|
-
- `source_documents` —
|
|
393
|
-
- `auto_inject_specialists` — default `true
|
|
441
|
+
- `participants` — the provider/agent IDs the user specified, or all configured and available review-capable providers from `provider_list`, plus the host QA participant (`claude-qa` compatibility ID) through auto-injection or explicit listing. For QA, use detected model capabilities for lens assignment; use `trace_summary` only when it has relevant observations.
|
|
442
|
+
- `source_documents` — pre-created individual documents, each as `{ "document_id": "...", "provider": "..." }`. **Required** when the brigade includes Claude specialists on Claude-Code host (see 5M.0b — host runs the native subagent first, persists the result, and supplies it here). The `provider` value must be present in `participants`. For QA, also pass the host QA report/evidence packet as source material for the matching host QA participant. Pre-injected providers skip the individual fan-out AND every consensus round.
|
|
443
|
+
- `auto_inject_specialists` — default `true` only when the leader has not pre-injected specialists. **Pass `false` whenever you supply specialist `source_documents`**, otherwise the moderator may add a duplicate specialist participant on top of the one you already injected. When the user wants verbatim participants only, also pass `false`.
|
|
394
444
|
- `exclude_participants` — participant IDs to never include, applied regardless of `auto_inject_specialists`. Use this when the user explicitly wants a provider (including Ollama — there is no automatic Ollama filter anymore) kept out.
|
|
395
445
|
- `leader` — omit unless you need to override the session-context leader.
|
|
396
446
|
- `max_rounds` — default `10`. Raise for contested topics, lower for quick smoke-debates.
|
|
447
|
+
- `participant_timeout_ms` — omit for normal scoped reviews; set at least `600000` for whole-project, large-directory, deep review, or provider timeouts.
|
|
397
448
|
- `individual_review_prompt` / `files` — optional framing for the individual-review fan-out.
|
|
398
449
|
- `locale` — pass the locale resolved from `agestra.config.json` (fall back to providers.config locale). The moderator uses it for human-facing text; provider prompts remain English regardless.
|
|
399
450
|
|
|
@@ -417,7 +468,7 @@ Before deciding, read the on-disk outputs — the debate writes three folders un
|
|
|
417
468
|
|
|
418
469
|
- `.agestra/workspace/individual/` — per-participant individual reviews (`individual_{participant}_{topic}_{date}_{seq}.md`). Includes auto-injected host specialists like `claude-reviewer` / `claude-qa` / `claude-security` when present.
|
|
419
470
|
- `.agestra/workspace/debates/` — debate transcript (`debate_{topic}_{date}_{seq}.md`), consensus ledger (`{sessionId}.consensus.json`), and structured session record (`{sessionId}.session.json`). The session record remains after `approve` / `reject` for idempotent replays and audit.
|
|
420
|
-
- `.agestra/workspace/synthesis/` — the final synthesis document, written
|
|
471
|
+
- `.agestra/workspace/synthesis/` — the final synthesis document, written after `agent_debate_approve` or `agent_debate_reject` succeeds.
|
|
421
472
|
|
|
422
473
|
Use `Read` / `Grep` against these paths plus the in-result snapshot to judge whether the debate outcome matches the design.
|
|
423
474
|
|
|
@@ -436,7 +487,7 @@ Pick exactly one of the three follow-up tools, based on inspection:
|
|
|
436
487
|
|
|
437
488
|
1. **Accept the outcome** → call `agent_debate_approve` with `session_id` and an optional `leader_note` (appended to the synthesis footer under "Leader approval notes"). The moderator writes the synthesis markdown, updates the session record to `approved`, and returns `synthesisDocPath`. If this is QA-only, proceed to Phase 7. If this is an implementation flow and the QA verdict is PASS or CONDITIONAL PASS, proceed to Phase 6 unless the debate explicitly included the post-implementation review lens. If this is an implementation flow and the QA verdict is FAIL, return to Phase 3 with targeted fixes or escalate to the user instead of claiming completion.
|
|
438
489
|
2. **Need more deliberation** → call `agent_debate_continue` with `session_id` and `additional_rounds` (`3`, `5`, or `10` only). The handler returns `status: running`; poll `agent_debate_status` again until it reaches the approval gate. Use this when the debate was close but unresolved, or when `escalated` came too early.
|
|
439
|
-
3. **Reject the outcome** → call `agent_debate_reject` with `session_id` and a `reason` (captured in the transcript footer). Optionally set `spawn_issue: true` to write a lightweight issue branch document into `individual/` listing non-accepted proposals for later handling.
|
|
490
|
+
3. **Reject the outcome** → call `agent_debate_reject` with `session_id` and a `reason` (captured in the transcript footer and rejected synthesis). Optionally set `spawn_issue: true` to write a lightweight issue branch document into `individual/` listing non-accepted proposals for later handling. The moderator writes a rejected synthesis that summarizes accepted, excluded, and unresolved items, then closes the debate.
|
|
440
491
|
|
|
441
492
|
All three tools are idempotent on terminal states — re-calling returns the cached outcome.
|
|
442
493
|
|
|
@@ -469,7 +520,7 @@ Provide a clear summary to the user:
|
|
|
469
520
|
- What changed (files modified, features added)
|
|
470
521
|
- Verification summary:
|
|
471
522
|
- Host-only QA/review: QA depth, E2E status, QA report path, QA cycle count + what was auto-fixed, review report path, review verdict
|
|
472
|
-
- QA Brigade / configured-provider QA: host QA report path, E2E host-only status, participant list, assigned lenses, accepted ledger items, excluded ledger items, open/opinion items, consensus verdict, dissenting findings, structured debate outcome (`approved` / `rejected`, with round count), `auto_inject_specialists` state, final synthesis path
|
|
523
|
+
- QA Brigade / configured-provider QA: host QA report path, E2E host-only status, participant list, assigned lenses, accepted ledger items, excluded ledger items, open/opinion items, consensus verdict, dissenting findings, structured debate outcome (`approved` / `rejected`, with round count), `auto_inject_specialists` state, final synthesis path from `.agestra/workspace/synthesis/`, and links to the individual reviews under `.agestra/workspace/individual/` and the transcript under `.agestra/workspace/debates/`
|
|
473
524
|
- Any issues found and how they were resolved
|
|
474
525
|
|
|
475
526
|
</Workflow>
|
|
@@ -479,7 +530,7 @@ When transitioning between workflow phases, create a handoff document summarizin
|
|
|
479
530
|
|
|
480
531
|
Phase 2→3 Handoff:
|
|
481
532
|
- Work mode selected (Leader-host only / Multi-AI)
|
|
482
|
-
- Total tasks, host-implementer task count, CLI
|
|
533
|
+
- Total tasks, host-implementer task count, CLI worker count, local/tool-model AgentLoop task count with read-only vs write-enabled policy
|
|
483
534
|
- Task dependency graph
|
|
484
535
|
- Risk flags (shared files, complex tasks)
|
|
485
536
|
- Context for workers (design doc path, Implementation Progress rows, naming conventions, key decisions)
|
|
@@ -505,8 +556,8 @@ Bad: "Add a validation function to the user module"
|
|
|
505
556
|
Good: "In `packages/core/src/user.ts`, add a `validateEmail(email: string): boolean` function that follows the same pattern as `validateUsername` on line 42. Must handle empty strings, return false for invalid format. Export from `packages/core/src/index.ts`. Do NOT modify existing functions."
|
|
506
557
|
</Prompt_Crafting>
|
|
507
558
|
|
|
508
|
-
<
|
|
509
|
-
|
|
559
|
+
<Model_Capability_Routing>
|
|
560
|
+
Distribute work according to the capabilities of detected models, including frontier and local models. For local Ollama models, check model size via `ollama_models` first:
|
|
510
561
|
|
|
511
562
|
| Model Size | Suitable Tasks |
|
|
512
563
|
|---|---|
|
|
@@ -515,8 +566,12 @@ When routing tasks to Ollama, check model size via `ollama_models` first:
|
|
|
515
566
|
| 8-20 GB (~7-14B params) | Code generation, detailed analysis, multi-step reasoning |
|
|
516
567
|
| > 20 GB (~14B+ params) | Complex refactoring, architecture analysis |
|
|
517
568
|
|
|
518
|
-
Do NOT assign tasks beyond a model's capability. When in doubt, use a
|
|
519
|
-
|
|
569
|
+
Do NOT assign tasks beyond a detected model's capability. When in doubt, use a higher-capability frontier provider or host-local specialist instead.
|
|
570
|
+
|
|
571
|
+
For implementation tasks, also check the provider execution policy:
|
|
572
|
+
- `read-only`: assign only reading, searching, analysis, review, patch-plan, or candidate-diff work.
|
|
573
|
+
- `workspace-write` / `full-auto`: the model has the same workspace write permission class as other write-enabled providers. Keep tasks scoped to explicit files and inspect the resulting diff before continuing.
|
|
574
|
+
</Model_Capability_Routing>
|
|
520
575
|
|
|
521
576
|
<Principles>
|
|
522
577
|
|
|
@@ -556,10 +611,10 @@ The design document is the authority. If an AI's output conflicts with the desig
|
|
|
556
611
|
## MCP (External AI & Infrastructure)
|
|
557
612
|
- `environment_check` — detect CLI tools, Ollama models, infrastructure
|
|
558
613
|
- `provider_list` / `provider_health` — check external AI availability
|
|
559
|
-
- `trace_query` / `trace_summary` / `trace_visualize` — provider quality
|
|
614
|
+
- `trace_query` / `trace_summary` / `trace_visualize` — optional provider quality observations from prior recorded runs
|
|
560
615
|
- `ai_chat` / `ai_analyze_files` / `ai_compare` — query external AI
|
|
561
616
|
- `agent_debate_structured` — start a structured multi-AI debate in the background (individual/source material → clarification → JSON consensus rounds → aggregation → approval gate). It returns `status: running`; poll `agent_debate_status`. Supports `mode: "review" | "idea"`, optional `source_documents`, `auto_inject_specialists` (default `true`) to auto-add host reviewer/QA/security specialists (compatibility IDs: `claude-reviewer` / `claude-qa` / `claude-security`) based on topic, and `exclude_participants` as the escape hatch (also the way to keep Ollama or any other provider out — there is no automatic Ollama filter).
|
|
562
|
-
- `agent_debate_approve` / `agent_debate_continue` / `agent_debate_reject` — leader-only finalization tools for a structured session at the approval gate. `approve` writes
|
|
617
|
+
- `agent_debate_approve` / `agent_debate_continue` / `agent_debate_reject` — leader-only finalization tools for a structured session at the approval gate. `approve` writes an approved synthesis under `.agestra/workspace/synthesis/`; `continue(additional_rounds=N)` accepts only `3`, `5`, or `10` and returns `running`; `reject(reason=..., spawn_issue?=true)` writes a rejected synthesis and can also write a follow-up issue document.
|
|
563
618
|
- Low-level debate primitives — legacy / diagnostic use only; prefer the structured debate tools for review, idea, and design workflows.
|
|
564
619
|
- `agent_cross_validate` — cross-validate outputs between providers
|
|
565
620
|
- `cli_worker_spawn` / `cli_worker_status` / `cli_worker_collect` / `cli_worker_stop` — manage Codex/Gemini CLI workers
|
|
@@ -598,7 +653,7 @@ Spawning a Codex CLI worker to refactor the auth module in an isolated worktree.
|
|
|
598
653
|
<Constraints>
|
|
599
654
|
- Do NOT write, edit, or create files. Delegate all implementation.
|
|
600
655
|
- Do NOT skip the user approval step before executing tasks (in supervised mode).
|
|
601
|
-
- Do NOT assign complex tasks to small
|
|
656
|
+
- Do NOT assign complex tasks to models whose detected capability does not qualify, including small local models.
|
|
602
657
|
- Do NOT accept "simplified" or "partial" results from AIs.
|
|
603
658
|
- Do NOT proceed to QA until you've inspected all results yourself.
|
|
604
659
|
- Use MCP tools for external AI orchestration and change review.
|
package/commands/design.md
CHANGED
|
@@ -97,7 +97,7 @@ Team-lead owns the rest:
|
|
|
97
97
|
- Spawn `agestra:agestra-moderator` or `agestra:agestra-designer` directly when external providers are involved
|
|
98
98
|
- Build individual documents or hand-edit generated debate/synthesis Markdown
|
|
99
99
|
|
|
100
|
-
Direct execution from this command bypasses team-lead's
|
|
100
|
+
Direct execution from this command bypasses team-lead's capability-based routing and optional trace-assisted signals (`trace_summary`), task design, and consistency enforcement. Always go through team-lead in Branch B.
|
|
101
101
|
|
|
102
102
|
## Step 3: Present the result
|
|
103
103
|
|
package/commands/idea.md
CHANGED
|
@@ -87,12 +87,12 @@ Team-lead owns the rest:
|
|
|
87
87
|
|
|
88
88
|
Writing the final project-facing idea decision record under `docs/ideas/` is allowed and expected after the user chooses or approves ideas. `.agestra/workspace/` is the internal research/debate workspace, not the user's primary browsing surface.
|
|
89
89
|
|
|
90
|
-
Direct execution bypasses team-lead's
|
|
90
|
+
Direct execution bypasses team-lead's capability-based routing, optional trace-assisted signals, and consistency enforcement. Always go through team-lead in Branch B.
|
|
91
91
|
|
|
92
92
|
## Step 3: Present to the user
|
|
93
93
|
|
|
94
94
|
When team-lead (or the host specialist in Branch A) returns:
|
|
95
|
-
- Name the debate document, consensus JSON ledger, and final synthesis document
|
|
95
|
+
- Name the debate document, consensus JSON ledger, and final synthesis document when the structured session is finalized
|
|
96
96
|
- Name the idea decision document under `docs/ideas/` after the user chooses or approves ideas
|
|
97
97
|
- Show ideas grouped as Make Soon, Explore Next, and Inspiration Bank when available
|
|
98
98
|
- Explain accepted, excluded, and still-open ideas in plain language
|
package/commands/implement.md
CHANGED
|
@@ -51,10 +51,11 @@ Use AskUserQuestion to present the recommended routing in the user's language, o
|
|
|
51
51
|
If team mode is not available, skip the question and use Leader-host only.
|
|
52
52
|
|
|
53
53
|
Routing guidance:
|
|
54
|
-
-
|
|
55
|
-
-
|
|
54
|
+
- Distribute work according to detected model capability, including frontier and local models. Use model tier, task risk, and execution policy first; use trace quality data only when it exists.
|
|
55
|
+
- Simple and repetitive, low-risk work → prefer a capability-matched local/tool model when available. If its `executionPolicy` is `workspace-write` or `full-auto`, it may use AgentLoop read/write tools through `ai_chat`; if it is `read-only`, use it for analysis, patch plans, or candidate diffs only.
|
|
56
|
+
- Complex or multi-file implementation → prefer high-capability frontier/CLI workers such as Codex/Gemini in isolated worktrees, or the host implementer when that is safer.
|
|
56
57
|
- Small but risky work → prefer `agestra-implementer` or a high-capability CLI worker, with QA/review after.
|
|
57
|
-
- If trace quality data exists, use it
|
|
58
|
+
- If trace quality data exists, use it only as a tie-breaker between otherwise qualified providers. If no trace data exists, do not invent quality history; start with lower-risk, tightly scoped assignments.
|
|
58
59
|
|
|
59
60
|
## Step 4: Ensure there is a design basis
|
|
60
61
|
|
|
@@ -106,7 +107,7 @@ Team-lead owns the rest:
|
|
|
106
107
|
**Multi-AI mode:**
|
|
107
108
|
- Presents task-to-provider routing table for approval
|
|
108
109
|
- Dispatches CLI workers (`cli_worker_spawn`) for suitable Codex/Gemini tasks in isolated worktrees
|
|
109
|
-
- Uses
|
|
110
|
+
- Uses capability-matched local/tool models through `ai_chat` with tools selected from their `executionPolicy`: read-only policies get read/search tools, while `workspace-write` / `full-auto` policies may perform scoped file writes.
|
|
110
111
|
- Reviews changes with `agent_changes_review` before merge
|
|
111
112
|
- Runs Phase 5M structured QA debate (cross-validation across providers)
|
|
112
113
|
|
package/commands/qa.md
CHANGED
|
@@ -69,14 +69,15 @@ Hand off to `agestra:agestra-team-lead` with:
|
|
|
69
69
|
- **E2E/runtime execution:** host-owned only; external providers cross-validate artifacts and findings, not browser/dev-server execution
|
|
70
70
|
- **Design doc reference:** path under `docs/plans/`
|
|
71
71
|
- **Report artifact path expectation:** `docs/reports/qa/YYYY-MM-DD-qa-[target].md`
|
|
72
|
-
- **Available providers:** from `environment_check
|
|
72
|
+
- **Available providers:** from `environment_check`; include configured providers when their detected model capability is suitable, using read-only QA/review tools so verification cannot modify source files
|
|
73
73
|
- **Requested providers:** explicit names captured from user wording; otherwise "all configured and available review-capable providers"
|
|
74
|
+
- **Specialist pre-injection (Claude-Code host):** when the brigade should include `claude-qa` (and optionally `claude-reviewer` / `claude-security` as supporting lenses), team-lead MUST follow `agents/agestra-team-lead.md` Phase 5M.0b — run the host specialist (`agestra-qa` etc.) via the `Agent` tool first, persist each result through `workspace_create_document`, then pass them as `source_documents` entries with `auto_inject_specialists: false`. The pre-injected host QA report itself doubles as the evidence packet for the matching `claude-qa` participant. Do NOT rely on `auto_inject_specialists: true` when a Claude specialist participant is wanted
|
|
74
75
|
- **Brigade lenses:** host executable evidence, spec-to-code compliance, implementation progress truthfulness, integration/regression risk, edge/error states, test adequacy, basic safety hygiene, and E2E artifact review when E2E ran
|
|
75
76
|
- **JSON finding flow:** candidate findings become `ITEM-*` ledger items; participants use the existing `agree` / `disagree` / `opinion` / `revise` stance contract; only ledger-accepted items affect the final verdict
|
|
76
77
|
- **Locale:** from `setup_status`
|
|
77
78
|
- **Original user request:** preserve verbatim
|
|
78
79
|
|
|
79
|
-
Team-lead owns the QA Brigade handoff and leader
|
|
80
|
+
Team-lead owns the QA Brigade handoff and leader finalization gate. The moderator engine owns provider fan-out, `ITEM-*` creation, JSON stance turns, consensus ledger aggregation, minority/open items, and final synthesis after approval or rejection. This command must not call `agent_debate_structured` directly. Do not ask for a separate multi-AI confirmation in Branch B; provider selection already came from setup. Honor explicit host-only wording.
|
|
80
81
|
|
|
81
82
|
## Step 4: Present the final result
|
|
82
83
|
|
package/commands/review.md
CHANGED
|
@@ -82,8 +82,10 @@ Hand off to the `agestra:agestra-team-lead` agent with multi-AI mode **pre-selec
|
|
|
82
82
|
- **Depth/tone/audience:** selected or inferred values
|
|
83
83
|
- **Boundary:** this is critique/evaluation, not QA PASS/FAIL and not a deep security audit
|
|
84
84
|
- **Report artifact path expectation:** `docs/reports/review/YYYY-MM-DD-review-[target].md`
|
|
85
|
-
- **Available providers:** from `environment_check
|
|
85
|
+
- **Available providers:** from `environment_check`; include configured providers when their detected model capability is suitable, using read-only review tools for code/document critique
|
|
86
86
|
- **Requested providers:** explicit names captured from user wording; otherwise "all available review-capable"
|
|
87
|
+
- **Specialist pre-injection (Claude-Code host):** when the brigade should include the `claude-reviewer` specialist lens, team-lead MUST follow `agents/agestra-team-lead.md` Phase 5M.0b — run `agestra-reviewer` via the `Agent` tool first, persist the result through `workspace_create_document`, then pass it as a `source_documents` entry with `auto_inject_specialists: false`. Do NOT rely on `auto_inject_specialists: true` when a Claude specialist participant is wanted — the structured-debate engine cannot call back into the Claude-Code host's native subagents
|
|
88
|
+
- **Scale controls:** if the target is a whole project, a large directory, or deep review, instruct team-lead to create a bounded review packet before fan-out: changed files, key entry points, relevant docs/config, and explicit exclusions. Do not ask external CLI providers to explore an unbounded large repository from scratch. Use `participant_timeout_ms: 600000` or higher for large/deep reviews, and split the review into narrower area debates if providers still time out.
|
|
87
89
|
- **Locale:** from `setup_status`
|
|
88
90
|
- **Original user request:** preserve verbatim
|
|
89
91
|
|
|
@@ -99,7 +101,7 @@ Team-lead owns the rest:
|
|
|
99
101
|
- Spawn `agestra:agestra-moderator` or `agestra:agestra-reviewer` directly when external providers are involved
|
|
100
102
|
- Build individual documents or hand-edit generated debate/synthesis Markdown
|
|
101
103
|
|
|
102
|
-
Direct execution from this command bypasses team-lead's task design,
|
|
104
|
+
Direct execution from this command bypasses team-lead's task design, capability-based routing with optional trace-assisted signals (`trace_summary`), and consistency enforcement. Always go through team-lead in Branch B.
|
|
103
105
|
|
|
104
106
|
## Step 4: Present the final result
|
|
105
107
|
|
package/commands/security.md
CHANGED
|
@@ -59,8 +59,9 @@ Hand off to `agestra:agestra-team-lead` with:
|
|
|
59
59
|
- **Risk surfaces:** explicit user selections or inferred surfaces
|
|
60
60
|
- **Tool permission choices:** approved / declined / not asked, with exact approved commands if any
|
|
61
61
|
- **Report artifact path expectation:** `docs/reports/security/YYYY-MM-DD-security-[target].md`
|
|
62
|
-
- **Available providers:** from `environment_check
|
|
62
|
+
- **Available providers:** from `environment_check`; include configured providers when their detected model capability is suitable, using read-only security-review tools unless the user explicitly approves a separate implementation task
|
|
63
63
|
- **Requested providers:** explicit names captured from user wording; otherwise "all available security-capable"
|
|
64
|
+
- **Specialist pre-injection (Claude-Code host):** when the brigade should include the `claude-security` specialist lens, team-lead MUST follow `agents/agestra-team-lead.md` Phase 5M.0b — run `agestra-security` via the `Agent` tool first, persist the result through `workspace_create_document`, then pass it as a `source_documents` entry with `auto_inject_specialists: false`. The structured-debate engine cannot call back into the Claude-Code host's native subagents, so `auto_inject_specialists: true` is unsafe whenever a Claude specialist participant is wanted
|
|
64
65
|
- **Locale:** from `setup_status`
|
|
65
66
|
- **Original user request:** preserve verbatim
|
|
66
67
|
|