npm - agestra - Versions diffs - 4.8.3 → 4.8.4 - Mend

agestra 4.8.3 → 4.8.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/AGENTS.md +1 -1
package/GEMINI.md +1 -1
package/README.ja.md +4 -1
package/README.ko.md +4 -1
package/README.md +4 -1
package/README.zh.md +4 -1
package/agents/agestra-moderator.md +379 -289
package/agents/agestra-team-lead.md +71 -7
package/commands/review.md +1 -1
package/dist/bundle.js +219 -182
package/package.json +1 -1
package/skills/review.md +1 -1

package/agents/agestra-team-lead.md CHANGED Viewed

@@ -191,14 +191,17 @@ After each task completes:
    - Naming conventions are consistent
    - No conflicting changes to shared files
    - Import/export chains are complete
-6. If issues found → craft a detailed correction prompt and re-assign to the same AI or address it in Claude execution.
 6. If issues found → craft a detailed correction prompt and re-assign to the same AI or fix directly in Claude execution.
 7. If all checks pass:
    - For CLI worker tasks: call `agent_changes_accept` to merge worktree changes
    - For rejected CLI worker tasks: call `agent_changes_reject` with reason
-   - Proceed to Phase 5 (QA Cycle).
+   - Proceed to verification:
+     - **Multi-AI mode** → Phase 5M (Structured Debate) replaces the separate QA and Quality Gate phases.
+     - **Claude-only mode** → Phase 5 (QA Cycle) followed by Phase 6 (Quality Gate).
-### Phase 5: QA Cycle
+### Phase 5: QA Cycle (Claude-only mode)
+> Used when Work Mode in Phase 2 was **Claude only**. In Multi-AI mode, skip to Phase 5M.
 Run formal verification with automatic fix loop:
@@ -228,7 +231,65 @@ Run formal verification with automatic fix loop:
    - `INTEGRATION_BREAK` → cross-component conflict, re-assign with both sides' context
    - `TEST_FAILURE` → implementation bug, re-assign with test output and expected behavior
-### Phase 6: Quality Gate
+### Phase 5M: Structured Debate (Multi-AI mode)
+> Used when Work Mode in Phase 2 was **Multi-AI**. Replaces Phase 5 (QA) and Phase 6 (Quality Gate) in a single coordinated cross-AI review. In Claude-only mode, skip this phase.
+Run the structured-debate MCP flow. This is a **two-step** lifecycle: the moderator runs the debate to a terminal aggregation state, then parks the session in `ready-for-approval` waiting for the leader (this agent) to finalize. The moderator does NOT write the synthesis file on its own — approval must be explicit.
+#### 5M.1 Start the debate
+Call `agent_debate_structured` with:
+- `topic` — short slug (used in file names under `.agestra/workspace/`).
+- `scope` — concrete framing: file list, task description, or the design doc path.
+- `participants` — the provider/agent IDs the user specified at Work Mode selection, or the qualified set from `trace_summary`.
+- `auto_inject_specialists` — default `true`. When true, the moderator auto-adds `claude-reviewer` and/or `claude-qa` on top of `participants` based on topic heuristics (e.g. review-ish topics pull the reviewer, QA/verification-ish topics pull qa). When the user wants verbatim participants only, pass `false`.
+- `exclude_participants` — participant IDs to never include, applied regardless of `auto_inject_specialists`. Use this when the user explicitly wants a provider (including Ollama — there is no automatic Ollama filter anymore) kept out.
+- `leader` — omit unless you need to override the session-context leader.
+- `max_rounds` — default `10`. Raise for contested topics, lower for quick smoke-debates.
+- `individual_review_prompt` / `files` — optional framing for the individual-review fan-out.
+- `locale` — pass the locale resolved from `agestra.config.json` (fall back to providers.config locale). The moderator uses it for human-facing text; provider prompts remain English regardless.
+The tool returns a `StructuredDebateRunResult` with the debate snapshot and a `debate_id`. Capture both.
+#### 5M.2 Await terminal state
+The result `status` will be one of:
+- `ready-for-approval` (subtype `consensus`) — every proposal was accepted or rejected and aggregation converged.
+- `ready-for-approval` (subtype `escalated`) — `max_rounds` was reached without consensus and the user elected to escalate during moderator prompts.
+- `error` — aggregation failed. Treat as an orchestration failure; do NOT call approve/continue/reject.
+In either `ready-for-approval` subtype the synthesis has NOT been written yet. The terminal report names the three follow-up tools; do not skip them.
+A 24h inactivity timer starts the moment the session enters `ready-for-approval`. If the leader does nothing, the session transitions to `leader-timeout` and only `agent_debate_reject` is accepted afterwards for cleanup.
+#### 5M.3 Inspect artifacts
+Before deciding, read the on-disk outputs — the debate writes three folders under the workspace:
+- `.agestra/workspace/individual/` — per-participant individual reviews (`individual_{participant}_{topic}_{date}_{seq}.md`). Includes auto-injected specialists like `claude-reviewer` / `claude-qa` when present.
+- `.agestra/workspace/debates/` — debate transcript (`debate_{topic}_{date}_{seq}.md`) plus the approval snapshot (`{sessionId}.approval.json`) while the session is parked.
+- `.agestra/workspace/synthesis/` — the final synthesis document, written only after `agent_debate_approve` succeeds.
+Use `Read` / `Grep` against these paths plus the in-result snapshot to judge whether the debate outcome matches the design.
+#### 5M.4 Finalize (leader decision)
+Pick exactly one of the three follow-up tools, based on inspection:
+1. **Accept the outcome** → call `agent_debate_approve` with `debate_id` and an optional `leader_note` (appended to the synthesis footer under "Leader approval notes"). The moderator writes the synthesis markdown, deletes the snapshot, and returns `synthesisDocPath`. Proceed to Phase 7 and relay the path to the user.
+2. **Need more deliberation** → call `agent_debate_continue` with `debate_id` and `additional_rounds` (typical values: `3`, `5`, or `10`; max `20`). The engine resumes the round loop from the prior snapshot and eventually re-parks the session in `ready-for-approval`. Loop back to 5M.2. Use this when the debate was close but unresolved, or when `escalated` came too early.
+3. **Reject the outcome** → call `agent_debate_reject` with `debate_id` and a `reason` (captured in the transcript footer). Optionally set `spawn_issue: true` to write a lightweight issue branch document into `individual/` listing non-accepted proposals for later handling. No synthesis is produced. The debate is closed.
+All three tools are idempotent on terminal states — re-calling returns the cached outcome.
+When the session is `escalated`, explain the situation to the user in supervised mode before choosing `continue` vs `reject`. In autonomous mode, prefer `continue` with `additional_rounds: 5` once; if it escalates again, `reject` with a clear reason and fall back to targeted fix tasks in Phase 3.
+### Phase 6: Quality Gate (Claude-only mode)
+> Used when Work Mode in Phase 2 was **Claude only**. In Multi-AI mode, the structured debate in Phase 5M subsumes this gate.
 Run the `agestra-reviewer` agent with TRUST 5 framework:
@@ -250,8 +311,9 @@ Provide a clear summary to the user:
 - How tasks were distributed (which AI/worker did what)
 - Task completion summary: total tasks, completed, failed, re-routed
 - What changed (files modified, features added)
-- QA cycle: how many cycles ran, what was auto-fixed
-- Quality Gate: TRUST 5 results
+- Verification summary:
+  - Claude-only: QA cycle count + what was auto-fixed, TRUST 5 Quality Gate result
+  - Multi-AI: structured debate outcome (`approved` / `rejected`, with round count), `auto_inject_specialists` state, final synthesis path (if approved) from `.agestra/workspace/synthesis/`, and links to the individual reviews under `.agestra/workspace/individual/` and the transcript under `.agestra/workspace/debates/`
 - Any issues found and how they were resolved
 </Workflow>
@@ -338,7 +400,9 @@ The design document is the authority. If an AI's output conflicts with the desig
 - `provider_list` / `provider_health` — check external AI availability
 - `trace_summary` / `trace_record` / `trace_compare` — provider quality tracking
 - `ai_chat` / `ai_analyze_files` / `ai_compare` — query external AI
-- `agent_debate_create/turn/status/summary/list/close/reset` — multi-AI debates
+- `agent_debate_structured` — start a structured multi-AI debate (individual reviews → clarification → rounds → aggregation → `ready-for-approval`). Supports `auto_inject_specialists` (default `true`) to auto-add `claude-reviewer` / `claude-qa` based on topic, and `exclude_participants` as the escape hatch (also the way to keep Ollama or any other provider out — there is no automatic Ollama filter).
+- `agent_debate_approve` / `agent_debate_continue` / `agent_debate_reject` — leader-only finalization tools for a `ready-for-approval` session. `approve` writes the synthesis under `.agestra/workspace/synthesis/`; `continue(additional_rounds=N)` extends the debate (typical N ∈ {3, 5, 10}, max 20); `reject(reason=..., spawn_issue?=true)` closes the session with no synthesis.
+- `agent_debate_create/turn/status/summary/list/close/reset` — low-level debate primitives (legacy / diagnostic use).
 - `agent_cross_validate` — cross-validate outputs between providers
 - `cli_worker_spawn` / `cli_worker_status` / `cli_worker_collect` / `cli_worker_stop` — manage Codex/Gemini CLI workers
 - `agent_changes_review` / `agent_changes_accept` / `agent_changes_reject` — review/merge worktree changes

package/commands/review.md CHANGED Viewed

@@ -60,7 +60,7 @@ Call `environment_check` to determine which providers and modes are available.
 - Treat the Claude reviewer agent as asynchronous work that may legitimately take several minutes. Poll about once per minute; an empty or slowly growing output file is not a failure by itself.
 - Do NOT stop or replace the Claude reviewer with a duplicate main-thread review unless there is an explicit error, the user asks to cancel, or there has been no visible progress for at least 8 minutes. For large review scopes, allow up to 15 minutes before declaring the reviewer stalled.
 - If a background reviewer is still running, tell the user you are waiting and continue the orchestration. Do not short-circuit the workflow just because another provider finished earlier.
-- Do NOT use `agent_debate_moderate` as the primary review path. Use the turn-based flow (`agent_debate_create` + iterative `agent_debate_review` / `agent_debate_turn` + `agent_debate_conclude`) so long-running review rounds do not get cut off by host tool-call time limits.
+- Use the turn-based flow (`agent_debate_create` + iterative `agent_debate_review` / `agent_debate_turn` + `agent_debate_conclude`) or the approval-gated flow (`agent_debate_structured` + `agent_debate_approve`/`_continue`/`_reject`) so long-running review rounds do not get cut off by host tool-call time limits.
 **팀 구성:** `agestra:agestra-moderator` (리더) + `agestra:agestra-reviewer` (Claude) + 리뷰용 외부 AI (`gemini`, `codex`, 등록된 Claude-backed reviewer 등; `ollama` 제외)