npm - agestra - Versions diffs - 4.13.0 → 4.13.2 - Mend

agestra 4.13.0 → 4.13.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/README.ja.md +34 -12
package/README.ko.md +44 -61
package/README.md +43 -62
package/README.zh.md +34 -12
package/agents/agestra-designer.md +1 -1
package/agents/agestra-ideator.md +1 -1
package/agents/agestra-moderator.md +1 -2
package/agents/agestra-qa.md +22 -10
package/agents/agestra-reviewer.md +1 -1
package/agents/agestra-security.md +1 -1
package/agents/agestra-team-lead.md +85 -23
package/commands/implement.md +13 -3
package/commands/qa.md +11 -4
package/dist/bundle.js +1 -1
package/package.json +1 -1
package/scripts/host-assets/categories.mjs +156 -0

package/README.zh.md CHANGED Viewed

@@ -50,7 +50,7 @@ npm install -g agestra
 agestra-install gemini --assets --scope user
 ```
-Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/commands/agestra/`](.gemini/commands/agestra) 项目命令一起工作。user scope 的 `--assets` 会安装 Agestra Gemini native extension。只注册 MCP 时，请使用 `npm run install:gemini` 或 `agestra-install gemini`。
+Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md)、[`.gemini/commands/agestra/`](.gemini/commands/agestra) 和生成的 skills 一起工作。project scope 的 `--assets` 会写入受管文件，user scope 的 `--assets` 会安装 Agestra Gemini native extension。只注册 MCP 时，请使用 `npm run install:gemini` 或 `agestra-install gemini`。`npm run install:gemini:assets` 默认使用 user scope；如果要从 checkout 安装 project-scope 受管文件，请运行 `node scripts/install-host-mcp.mjs gemini --assets --scope project`。
 安装后可用的 Gemini 命令：
@@ -59,6 +59,8 @@ Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/comma
 - `/agestra:design`
 - `/agestra:idea`
 - `/agestra:implement`
+- `/agestra:qa`
+- `/agestra:security`
 ### 前置条件
@@ -83,9 +85,9 @@ Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/comma
 | 宿主 | 自然入口 |
 |------|----------|
-| Claude Code | `/agestra review`, `/agestra design`, `/agestra idea`, `/agestra implement` |
+| Claude Code | `/agestra setup`, `/agestra review`, `/agestra qa`, `/agestra security`, `/agestra design`, `/agestra idea`, `/agestra implement` |
 | Codex CLI | 按 `AGENTS.md` 指引直接用自然语言发起请求 |
-| Gemini CLI | `/agestra:review`, `/agestra:design`, `/agestra:idea`, `/agestra:implement` |
+| Gemini CLI | `/agestra:setup`, `/agestra:review`, `/agestra:qa`, `/agestra:security`, `/agestra:design`, `/agestra:idea`, `/agestra:implement` |
 三种宿主都会驱动同一个 MCP 服务，并共享 `commands/*.md` 中的工作流规范。
@@ -93,24 +95,29 @@ Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/comma
 | 命令 | 说明 |
 |------|------|
+| `/agestra setup` | 初始 AI 提供方选择与设置 |
 | `/agestra review [target]` | 审查代码质量、安全性和集成完成度 |
+| `/agestra qa [target]` | 验证实现结果并生成 PASS/FAIL 证据 |
+| `/agestra security [target]` | 执行专门的安全审查 |
 | `/agestra idea [topic]` | 通过与相似项目对比发掘改进点 |
 | `/agestra design [subject]` | 在实现前探索架构与设计取舍 |
-| `/agestra setup` | 初始 AI 提供方选择与设置 |
 | `/agestra implement [task]` | 以 Claude only 或 Multi-AI 模式执行实现 |
-当外部提供方可用时，文本命令（review、design、idea）直接进入终极辩论模式，进行多 AI 交叉验证。当未检测到提供方时，Claude 自动独立工作。
+当外部提供方可用时，review、QA、security、design、idea 工作流会经由 team-lead 进入多 AI 交叉验证。当未检测到提供方时，当前宿主的本地 specialist agent 会自动处理。
 ## 代理
 | 代理 | 模型 | 角色 |
 |------|------|------|
 | `agestra-team-lead` | Sonnet | 全局编排者：环境检查、按质量路由提供方、选择工作模式、监督 CLI Worker、驱动 QA 循环 |
+| `agestra-implementer` | Sonnet | 有范围的实现执行者：代码修改、测试更新、本地验证 |
+| `agestra-e2e-writer` | Sonnet | 持久 E2E 测试作者：只编写已批准的浏览器流程测试 |
 | `agestra-reviewer` | Opus | 严格质量审查者：关注安全、孤立实现、规格漂移、测试缺口 |
 | `agestra-designer` | Opus | 架构探索者：苏格拉底式提问、权衡分析 |
 | `agestra-ideator` | Sonnet | 改进点发现者：Web 调研、竞品分析 |
 | `agestra-moderator` | Sonnet | 多模式主持者：带共识检测的辩论、独立汇总、文档审查、冲突解决 |
 | `agestra-qa` | Opus | QA 验证者：检查设计符合性并给出 PASS/FAIL 判断 |
+| `agestra-security` | Opus | 安全审查者：威胁模型、认证/数据流风险、依赖与密钥卫生 |
 ## 技能
@@ -125,6 +132,9 @@ Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/comma
 | `design` | 包含 Multi-AI 模式选择的架构探索工作流 |
 | `idea` | 包含 Multi-AI 模式选择的改进发现工作流 |
 | `review` | 包含 Multi-AI 模式选择的代码质量·安全·硬编码审查工作流 |
+| `qa` | 设计契约验证与 PASS/FAIL 证据工作流 |
+| `security` | 专门安全审查工作流 |
+| `e2e` | 持久浏览器 E2E 测试编写工作流 |
 | `leader` | 多AI/提供方编排入口 — 捕获明确的提供方、辩论、共识或交叉验证信号，进行领域分类后委托给 `agestra-team-lead` |
 ---
@@ -147,7 +157,7 @@ Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/comma
 - **Provider abstraction** — 所有后端都实现 `AIProvider`（`chat`、`healthCheck`、`getCapabilities`）。新增提供方只需新增一个 provider 包并注册工厂。
 - **Zero-config** — 启动时自动检测提供方，无需手动配置。
-- **Host-native** — Claude 使用插件包，Codex 使用 `AGENTS.md`，Gemini 使用 `GEMINI.md` 和项目命令，但三者共享同一套 MCP 服务与工作流核心。
+- **Host-native** — Claude 使用插件包，Codex 使用 `AGENTS.md` 和 custom agents，Gemini 使用 `GEMINI.md`、commands、skills 或 native extension。所有宿主共享同一套 MCP 服务与工作流核心。
 - **Modular dispatch** — 每类工具都是独立模块，对外提供 `getTools()` 和 `handleTool()`。服务端负责动态收集与分发。
 - **Atomic writes** — 所有文件操作都采用“写临时文件再重命名”的方式，避免损坏。
 - **Dead-end tracking** — 失败方案会被记录，并注入后续提示词。
@@ -155,11 +165,11 @@ Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/comma
 ### 工作模式
-**文本工作**（review、design、idea）：有提供方 → 终极辩论模式；无提供方 → Claude only
+**文本工作**（review、QA、security、design、idea）：有提供方 → 终极辩论模式；无提供方 → Claude only
 **实现工作**（team-lead orchestration）：
-- **Claude만으로** — Claude 直接结合项目/全局代理完成实现。
-- **다른 AI도 함께** — CLI Worker（Codex/Gemini）在隔离的 git worktree 中自主编码，Ollama 处理简单任务，Claude 负责监督与合并。
+- **仅 Claude** — Claude 直接结合项目/全局代理完成实现。
+- **与其他 AI 一起** — CLI Worker（Codex/Gemini）在隔离的 git worktree 中自主编码，Ollama 处理简单任务，Claude 负责监督与合并。
 ---
@@ -237,7 +247,7 @@ Gemini 会结合仓库根目录下的 [GEMINI.md](GEMINI.md) 与 [`.gemini/comma
 | 工具 | 说明 |
 |------|------|
-| `host_assets_status` | 检查 Codex custom agents 等生成的宿主原生资产 |
+| `host_assets_status` | 检查 Codex custom agents、Gemini assets 等生成的宿主原生资产 |
 | `host_assets_install` | 显式安装或刷新受管宿主原生资产 |
 | `host_assets_uninstall` | 移除 Agestra 追踪的受管宿主原生资产 |
@@ -326,21 +336,30 @@ agestra/
 ├── .gemini/
 │   └── commands/
 │       └── agestra/
+│           ├── setup.toml   # Gemini CLI 的 /agestra:setup
 │           ├── review.toml  # Gemini CLI 的 /agestra:review
 │           ├── design.toml  # Gemini CLI 的 /agestra:design
 │           ├── idea.toml    # Gemini CLI 的 /agestra:idea
-│           └── implement.toml # Gemini CLI 的 /agestra:implement
+│           ├── implement.toml # Gemini CLI 的 /agestra:implement
+│           ├── qa.toml      # Gemini CLI 的 /agestra:qa
+│           └── security.toml # Gemini CLI 的 /agestra:security
 ├── commands/
+│   ├── setup.md             # /agestra setup — 提供方设置
 │   ├── review.md            # /agestra review — 质量验证
+│   ├── qa.md                # /agestra qa — PASS/FAIL 验证
+│   ├── security.md          # /agestra security — 安全审查
 │   ├── idea.md              # /agestra idea — 改进点发现
 │   ├── design.md            # /agestra design — 架构探索
 │   └── implement.md         # /agestra implement — 实现工作流
 ├── agents/
+│   ├── agestra-implementer.md # 有范围的实现执行者（Sonnet）
+│   ├── agestra-e2e-writer.md # 持久 E2E 测试作者（Sonnet）
 │   ├── agestra-reviewer.md  # 严格质量审查者（Opus）
 │   ├── agestra-designer.md  # 架构探索者（Opus）
 │   ├── agestra-ideator.md   # 改进点发现者（Sonnet）
 │   ├── agestra-moderator.md # 多模式主持者（Sonnet）
 │   ├── agestra-qa.md        # QA 验证者（Opus，不写代码）
+│   ├── agestra-security.md  # 安全审查者（Opus）
 │   └── agestra-team-lead.md # 全局编排者（Sonnet，不写代码）
 ├── skills/
 │   ├── provider-guide.md    # 提供方路由与模式说明
@@ -352,6 +371,9 @@ agestra/
 │   ├── design.md            # 架构探索工作流
 │   ├── idea.md              # 改进发现工作流
 │   ├── review.md            # 代码质量审查工作流
+│   ├── qa.md                # 设计契约 QA 工作流
+│   ├── security.md          # 专门安全审查工作流
+│   ├── e2e.md               # 持久 E2E 测试编写工作流
 │   └── leader.md            # 多AI 编排路由器
 ├── hooks/
 │   └── user-prompt-submit.md  # 工具推荐 hook
@@ -403,7 +425,7 @@ npm run uninstall:gemini
 npm run uninstall:gemini:assets
 ```
-`*:assets` 卸载命令会同时移除宿主注册和未修改的生成宿主资产。如果用户编辑过生成资产，Agestra 会保留该文件并报告。使用全局 npm 安装时，请运行 `agestra-uninstall codex --assets` 或 `agestra-uninstall gemini --assets --scope user`。
+`*:assets` 卸载命令会同时移除宿主注册和未修改的生成宿主资产。Codex assets 是 custom-agent 文件。Gemini project-scope assets 是受管文件，Gemini user-scope assets 通过 `gemini extensions uninstall agestra` 移除。如果用户编辑过生成资产，Agestra 会保留该文件并报告。使用全局 npm 安装时，请运行 `agestra-uninstall codex --assets` 或 `agestra-uninstall gemini --assets --scope user`。
 如果还想删除生成的项目数据，请手动删除 `.agestra/` 目录。

package/agents/agestra-designer.md CHANGED Viewed

@@ -36,7 +36,7 @@ description: |
 model: opus
 color: blue
 codexSandboxMode: workspace-write
-disallowedTools: Edit, NotebookEdit
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, Write
 ---
 <Role>

package/agents/agestra-ideator.md CHANGED Viewed

@@ -35,7 +35,7 @@ description: |
 model: sonnet
 color: green
 codexSandboxMode: workspace-write
-disallowedTools: Edit, NotebookEdit
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, Write
 ---
 <Role>

package/agents/agestra-moderator.md CHANGED Viewed

@@ -61,7 +61,7 @@ description: |
 model: sonnet
 color: cyan
 codexSandboxMode: read-only
-disallowedTools: Edit, NotebookEdit
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, mcp__plugin_agestra_agestra__provider_list, mcp__plugin_agestra_agestra__agent_debate_structured, mcp__plugin_agestra_agestra__agent_debate_status, mcp__plugin_agestra_agestra__agent_debate_approve, mcp__plugin_agestra_agestra__agent_debate_continue, mcp__plugin_agestra_agestra__agent_debate_reject, mcp__plugin_agestra_agestra__agent_debate_review, mcp__plugin_agestra_agestra__ai_chat, mcp__plugin_agestra_agestra__workspace_read, mcp__plugin_agestra_agestra__workspace_create_document
 ---
 <Role>
@@ -512,5 +512,4 @@ If `max_rounds` is hit with open proposals, the moderator surfaces the choice to
 - `ai_chat` — query individual providers for feedback (Independent Aggregation mode).
 - `workspace_create_document` — create analysis or aggregated documents (Independent Aggregation mode).
 - `workspace_read` — read individual provider documents by ID (Independent Aggregation mode).
-- `workspace_replace_document_content` — replace generated debate or synthesis markdown when the engine regenerates output from the ledger.
 </Tool_Usage>

package/agents/agestra-qa.md CHANGED Viewed

@@ -1,18 +1,29 @@
 ---
 name: agestra-qa
 description: |
-  Host-local document-first QA verifier. Validates implementation against docs/plans design
+  Host-local document-first QA evidence verifier. Validates implementation against docs/plans design
   contracts, Implementation Progress evidence, build/test results, runtime behavior, basic safety
   hygiene, and optional E2E/browser flows. Writes QA report artifacts under docs/reports/qa/.
-  Does NOT modify source code or add persistent test files. For multi-AI joint QA, route through
-  agestra-team-lead.
+  Does NOT modify source code or add persistent test files. When configured external providers are
+  available, normal /agestra qa requests should route through agestra-team-lead for the QA Brigade;
+  this agent supplies the host-owned evidence pass, especially for build/test and
+  E2E/runtime checks.
   <example>
-  Context: Implementation is done and needs single-host verification
+  Context: Implementation is done and configured providers are available
   user: "구현 다 했는데 QA 돌려줘"
+  assistant: "I'll use the agestra-team-lead agent to run the QA Brigade, with host-owned runtime evidence."
+  <commentary>
+  Default QA with providers — team-lead forms the QA Brigade, runs host QA evidence collection, then coordinates provider verdicts.
+  </commentary>
+  </example>
+  <example>
+  Context: Implementation is done and needs explicit single-host verification
+  user: "호스트만 써서 QA 돌려줘"
   assistant: "I'll use the agestra-qa agent to verify the implementation against the design."
   <commentary>
-  Single-host post-implementation verification — QA checks the design document, progress ledger,
+  Explicit host-only post-implementation verification — QA checks the design document, progress ledger,
   build/test commands, and selected runtime flows.
   </commentary>
   </example>
@@ -22,8 +33,9 @@ description: |
   user: "실제 화면 흐름까지 QA 해줘"
   assistant: "I'll use the agestra-qa agent and ask whether to run the full E2E path."
   <commentary>
-  QA explains E2E cost, then verifies existing E2E tests or temporary browser flows. Persistent
-  test-file creation or maintenance is handed to agestra-e2e-writer after approval.
+  QA explains E2E cost, then the host verifies existing E2E tests or temporary browser flows. Persistent
+  test-file creation or maintenance is handed to agestra-e2e-writer after approval. External providers
+  may review the resulting artifacts through team-lead, but do not run E2E/browser flows themselves.
   </commentary>
   </example>
@@ -32,14 +44,14 @@ description: |
   user: "코덱스랑 제미니로 같이 검증해줘"
   assistant: "I'll use the agestra-team-lead agent to run a multi-AI structured QA debate."
   <commentary>
-  Multi-AI verification — must go through team-lead which runs structured debate (mode:review)
-  with external providers cross-validating. Do NOT call agestra-qa directly here.
+  Multi-AI verification — must go through team-lead which forms the QA Brigade and runs structured debate (mode:review)
+  with external providers cross-validating host evidence. Do NOT call agestra-qa directly here.
   </commentary>
   </example>
 model: opus
 color: yellow
 codexSandboxMode: workspace-write
-disallowedTools: Edit, NotebookEdit
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, Write
 ---
 <Role>

package/agents/agestra-reviewer.md CHANGED Viewed

@@ -37,7 +37,7 @@ description: |
 model: opus
 color: red
 codexSandboxMode: workspace-write
-disallowedTools: Edit, NotebookEdit
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, Write
 ---
 <Role>

package/agents/agestra-security.md CHANGED Viewed

@@ -18,7 +18,7 @@ description: |
 model: opus
 color: red
 codexSandboxMode: workspace-write
-disallowedTools: Edit, NotebookEdit
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, Write
 ---
 <Role>

package/agents/agestra-team-lead.md CHANGED Viewed

@@ -70,7 +70,7 @@ description: |
 model: sonnet
 color: magenta
 codexSandboxMode: read-only
-disallowedTools: Write, Edit, NotebookEdit
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, mcp__plugin_agestra_agestra__environment_check, mcp__plugin_agestra_agestra__provider_list, mcp__plugin_agestra_agestra__provider_health, mcp__plugin_agestra_agestra__trace_query, mcp__plugin_agestra_agestra__trace_summary, mcp__plugin_agestra_agestra__trace_visualize, mcp__plugin_agestra_agestra__ai_chat, mcp__plugin_agestra_agestra__ai_analyze_files, mcp__plugin_agestra_agestra__ai_compare, mcp__plugin_agestra_agestra__agent_debate_structured, mcp__plugin_agestra_agestra__agent_debate_status, mcp__plugin_agestra_agestra__agent_debate_approve, mcp__plugin_agestra_agestra__agent_debate_continue, mcp__plugin_agestra_agestra__agent_debate_reject, mcp__plugin_agestra_agestra__agent_cross_validate, mcp__plugin_agestra_agestra__cli_worker_spawn, mcp__plugin_agestra_agestra__cli_worker_status, mcp__plugin_agestra_agestra__cli_worker_collect, mcp__plugin_agestra_agestra__cli_worker_stop, mcp__plugin_agestra_agestra__agent_changes_review, mcp__plugin_agestra_agestra__agent_changes_accept, mcp__plugin_agestra_agestra__agent_changes_reject
 ---
 <Role>
@@ -100,13 +100,14 @@ If invoked with **Domain: review**, do not enter implementation decomposition, w
 If invoked with **Domain: security**, do not enter implementation decomposition, worker routing, or code-changing phases. Execute the structured security workflow in `commands/security.md`, then report security findings, tool-assisted checks run/skipped/declined/unavailable, report artifact path, residual risk, and SECURITY PASS / PASS WITH HARDENING / SECURITY BLOCK. Security must not run destructive exploit tests, and must not install tools or run heavyweight/networked scans without explicit user approval.
-If invoked with **Domain: qa** or **Domain: implement, Submode: qa-only**, skip Phase 2 (Task Design), Phase 3 (Parallel Execution), and Phase 4 (Result Inspection) entirely — there is no code to write. Instead:
+If invoked with **Domain: qa** or **Domain: implement, Submode: qa-only**, skip Phase 2 (Task Design), Phase 3 (Parallel Execution), and Phase 4 (Result Inspection) entirely — there is no product code to write. Instead:
 1. Run Phase 1 (Situation Assessment) to confirm available providers and the design document scope.
 2. Preserve the QA depth from the handoff packet: Standard QA / Full QA with E2E / Decide automatically.
-3. Jump directly to verification:
-   - **Leader-host-only mode** → Phase 5 (QA Cycle): spawn `agestra-qa` against the existing changes, classify failures, report verdict. No QA Fix Loop unless the user explicitly requests follow-up fixes.
-   - **Multi-AI mode** → Phase 5M (Structured Debate) with `mode: "review"` and the QA-oriented `topic` framing (e.g. "spec-compliance verification of {scope}"). Participants: host `agestra-qa` + external review-capable providers (excluding `ollama`). Treat the structured debate as cross-validation: each participant produces an independent QA verdict, then the JSON consensus ledger merges into a final PASS / CONDITIONAL / FAIL.
+3. Choose QA verification routing independently from implementation routing:
+   - If the user explicitly requested host-only QA, or no configured external providers are available, run Phase 5 (Host QA Evidence Pass): spawn `agestra-qa` against the existing changes, classify failures, and report verdict. No QA Fix Loop unless the user explicitly requests follow-up fixes.
+   - Otherwise, run Phase 5M (QA Brigade) by default. Start with host-owned `agestra-qa` evidence collection, then hand off to the moderator engine via `agent_debate_structured`. The moderator engine runs the configured and available review-capable providers plus the host QA participant through the existing `ITEM-*` / JSON stance ledger flow. Give each participant an explicit QA lens and require independent PASS / CONDITIONAL / FAIL recommendations in their source material. Treat the structured debate as a brigade cross-check: every participant reviews the design, code, diff, host evidence, and peer findings; the JSON consensus ledger merges consensus and preserves dissent.
+   - E2E/runtime execution is host-owned only. External providers may review the host QA report, command output, screenshots, traces, and E2E findings, but must not run browser/dev-server flows or create persistent E2E files directly.
 4. Skip Phase 6 (Post-implementation Review) — that's the reviewer's territory, not QA-only.
 5. Phase 7 report: surface QA depth, E2E status, QA verdict, spec-to-code mapping summary, classified failures (`BUILD_ERROR` / `DESIGN_GAP` / `PROGRESS_MISMATCH` / `INTEGRATION_BREAK` / `TEST_FAILURE` / `E2E_FAILURE` / `SAFETY_HYGIENE_RISK`), any `E2E_TEST_WORK_REQUEST`, and the synthesis path (multi-AI) or QA agent report path (host-local).
@@ -154,7 +155,7 @@ Decompose the work into independent, assignable tasks:
    | Option | Description |
    |--------|-------------|
-   | **Leader-host only** | The current host uses `agestra-implementer` and specialist agents/prompts; no external coding workers |
+   | **Leader-host only** | The current host uses `agestra-implementer` and specialist agents/prompts; no external coding workers. QA routing still follows the configured-provider default unless host-only QA is requested |
    | **Multi-AI** | CLI AIs work autonomously when suitable, Ollama handles simple proposal work, host-local agents handle scoped implementation/review/QA |
    If no external providers available: skip selection, proceed with Leader-host only.
@@ -252,7 +253,7 @@ Execute approved tasks across available execution paths:
 **Result Integration:**
 - Leader-host implementation: changes are already applied on the main branch (no merge needed).
-- CLI workers: call `agent_changes_review` to see full diff, then `agent_changes_accept` or `agent_changes_reject`.
+- CLI workers: call `agent_changes_review` to inspect the full diff. Do **not** accept here — Phase 4 step 7 owns the supervised/autonomous accept gate.
 - File overlap between tracks: detect conflicts between implementer-applied changes and CLI worker worktrees. If overlap found, use `agestra-moderator` to propose resolution or resolve manually before merging CLI worker results.
 ### Phase 4: Result Inspection
@@ -273,15 +274,17 @@ After each task completes:
    - Import/export chains are complete
 6. If issues found → craft a detailed correction prompt and re-assign to the same AI or send a scoped fix task to `agestra-implementer`.
 7. If all checks pass:
-   - For CLI worker tasks: call `agent_changes_accept` to merge worktree changes
+   - For CLI worker tasks: gate `agent_changes_accept` by execution mode.
+     - **Supervised (default):** Summarize the diff (files touched, scope, risk highlights) and use `AskUserQuestion` to confirm the merge before calling `agent_changes_accept`. Call `agent_changes_reject` only after an explicit user rejection with a reason. If the user does not respond or `AskUserQuestion` is unavailable, leave the worker worktree pending, report the task ID, and wait for a later accept/reject decision.
+     - **Autonomous:** Record the review evidence in your status update (files, design alignment notes), then call `agent_changes_accept`. Escalate to the user instead of auto-accepting when the diff exceeds the worker's stated scope, adds unrequested files, or touches a file flagged as high-risk in Phase 2.
    - For rejected CLI worker tasks: call `agent_changes_reject` with reason
    - Proceed to verification:
-     - **Multi-AI mode** → Phase 5M (Structured Debate) replaces the separate QA and post-implementation review phases.
-     - **Leader-host-only mode** → Phase 5 (QA Cycle) followed by Phase 6 (Post-implementation Review).
+      - If configured external providers are available and the user did not explicitly request host-only QA → Phase 5M (QA Brigade).
+      - If no configured external providers are available, or the user explicitly requested host-only QA → Phase 5 (Host QA Evidence Pass) followed by Phase 6 (Post-implementation Review).
-### Phase 5: QA Cycle (Leader-host-only mode)
+### Phase 5: Host QA Evidence Pass
-> Used when Work Mode in Phase 2 was **Leader-host only**. In Multi-AI mode, skip to Phase 5M.
+> Used when no configured external providers are available, the user explicitly requested host-only QA, or Phase 5M needs host-owned executable evidence before provider cross-validation.
 Run formal verification with automatic fix loop:
@@ -322,9 +325,59 @@ Run formal verification with automatic fix loop:
    - After the tests exist or are updated, re-run `agestra-qa`.
    - If declined, keep the QA verdict/residual risk honest and do not mark E2E as covered.
-### Phase 5M: Structured Debate (Multi-AI mode)
+### Phase 5M: QA Brigade
-> Used when Work Mode in Phase 2 was **Multi-AI**. Replaces Phase 5 (QA) and Phase 6 (Post-implementation Review) in a single coordinated cross-AI review. In Leader-host-only mode, skip this phase.
+> Used for QA whenever configured external providers are available, unless the user explicitly requested host-only QA. This is the default for `/agestra qa` and post-implementation QA. It can also be used after Leader-host-only implementation because QA routing is separate from code-writing routing.
+The QA Brigade should feel like the review workflow's full formation, not a lightweight second opinion. Build a broad verification team and make the differences between providers useful.
+For QA topics, collect host-owned executable evidence first:
+1. Spawn `agestra-qa` with the design document, change scope, QA depth, and report artifact expectation under `docs/reports/qa/`.
+2. If QA depth includes E2E/runtime verification, only the host QA path runs browser/dev-server flows, screenshots, traces, or existing E2E commands.
+3. If `agestra-qa` returns `E2E_TEST_WORK_REQUEST`, pause for user approval before routing that packet to `agestra-e2e-writer`; do not ask external providers to create or repair persistent E2E tests.
+4. Use the host QA report path, command output, screenshots/traces, and E2E findings as evidence for provider cross-validation.
+#### 5M.0 Brigade formation
+Build the QA Brigade handoff before starting the moderator debate:
+| Brigade member | Role |
+|---|---|
+| Host `agestra-qa` / structured `claude-qa` participant | Evidence lead and debate participant: design/progress audit, build/test commands, host-owned E2E/runtime evidence, report artifact, and JSON stance turns |
+| Configured review-capable providers | Independent QA judges: each reviews the design, diff/code, host QA evidence, and peer claims |
+| `agestra-reviewer` lens | Optional support lens for production readiness, UX/product feel, maintainability, and test adequacy when those affect acceptability; do not turn QA into a general review |
+| `agestra-security` lens | Optional support lens for basic safety hygiene escalation when QA finds secrets, auth, file, command, network, or permission risk; use `/agestra security` for a dedicated audit |
+| `agestra-e2e-writer` | Not a brigade reviewer. Use only after an approved `E2E_TEST_WORK_REQUEST` for persistent E2E test work |
+Default participant policy:
+- Include every configured and available review-capable provider by default, not only the "best" one. Use `trace_summary` to assign lenses and order attention, not to shrink the brigade unless a provider is unavailable, explicitly excluded, or clearly unqualified for the requested lens.
+- Exclude `ollama` by default unless the user explicitly requested it for lightweight cross-checking.
+- Keep the host QA participant in the flow even when external providers are present, because executable evidence, E2E/runtime observation, and local command output are host-owned. In structured debate, this is the `claude-qa` compatibility participant when auto-injected or explicitly listed.
+- Assign distinct lenses so the output is not three copies of the same review. Suggested lenses: spec-to-code compliance, progress-table truthfulness, integration/regression risk, edge/error states, test adequacy, safety hygiene, and E2E artifact interpretation.
+- Each brigade member must issue an independent PASS / CONDITIONAL PASS / FAIL recommendation with evidence and confidence in its individual source material. Disagreement is useful; preserve minority reports in the final synthesis.
+#### 5M.0a QA mapping onto the existing JSON ledger
+Do not invent a separate QA adjudication schema. Use the moderator's existing structured-debate contract.
+Each candidate QA finding must become a normal consensus `ITEM-*` with source references. Participants vote through the existing JSON stance contract:
+| Stance | QA meaning |
+|---|---|
+| `agree` | Include this finding as a QA issue; the evidence supports it and the severity/scope are acceptable |
+| `disagree` | Do not include this finding: false positive, over-severe, duplicate, out-of-scope, already covered, or evidence is insufficient |
+| `revise` | The issue is real, but the claim, severity, scope, wording, or fix direction must change; include `proposedItem` |
+| `opinion` | The item requires a product/design/leader judgment rather than a QA fact decision |
+Ledger interpretation:
+- `accepted` means all active participants agree; only accepted blocking/conditional QA items can drive the final FAIL / CONDITIONAL PASS.
+- `excluded` means all active participants disagree; do not include it in the final QA issue list except as a brief overruled/minority note when useful.
+- `superseded` means the moderator accepted a revision or merge into another item; report the canonical item, not both duplicates.
+- `needs_opinion`, `unresolved`, and `no_response` mean the item is still open. Continue rounds when useful; if escalated to the leader, report it as open/dissenting rather than pretending consensus.
+- Evidence-insufficient findings should normally receive `disagree`, not `opinion`. Use `opinion` only for genuine product/design judgment calls.
+- The moderator handles duplicate/merge/superseded state in the ledger. Participants may point out duplication in comments or propose a `revise`, but they do not manually merge markdown.
+- The leader does not decide item inclusion by hand. The leader inspects the JSON ledger and chooses approve / continue / reject at the approval gate.
 Run the structured-debate MCP flow. This is a **background lifecycle**: `agent_debate_structured` creates a durable session record immediately and returns `status: running`; the leader polls `agent_debate_status` until the moderator parks the session in `ready-for-approval`, `escalated`, or `error`. The moderator does NOT write the synthesis file on its own — approval must be explicit.
@@ -332,11 +385,11 @@ Run the structured-debate MCP flow. This is a **background lifecycle**: `agent_d
 Call `agent_debate_structured` with:
-- `topic` — short slug (used in file names under `.agestra/workspace/`).
+- `topic` — short slug (used in file names under `.agestra/workspace/`), prefixed or framed as QA Brigade when useful.
 - `mode` — `"review"` for QA/review/security consensus, `"idea"` for exploratory design or option discovery.
-- `scope` — concrete framing: file list, task description, or the design doc path.
-- `participants` — the provider/agent IDs the user specified at Work Mode selection, or the qualified set from `trace_summary`.
-- `source_documents` — optional pre-created individual documents, each as `{ "document_id": "...", "provider": "..." }`.
+- `scope` — concrete framing: file list, task description, design doc path, changed files, and host QA report/evidence path.
+- `participants` — the provider/agent IDs the user specified, or all configured and available review-capable providers from `provider_list`, plus the host QA participant (`claude-qa` compatibility ID) through auto-injection or explicit listing. For QA, use `trace_summary` for lens assignment rather than narrowing by default. Exclude `ollama` unless explicitly requested for lightweight cross-checking.
+- `source_documents` — optional pre-created individual documents, each as `{ "document_id": "...", "provider": "..." }`. For QA, pass the host QA report/evidence packet as source material for the matching host QA participant. The `provider` value must be present in `participants`.
 - `auto_inject_specialists` — default `true`. When true, the moderator auto-adds host reviewer/QA/security specialists on top of `participants` based on topic heuristics (currently exposed as `claude-reviewer`, `claude-qa`, and/or `claude-security` for compatibility). When the user wants verbatim participants only, pass `false`.
 - `exclude_participants` — participant IDs to never include, applied regardless of `auto_inject_specialists`. Use this when the user explicitly wants a provider (including Ollama — there is no automatic Ollama filter anymore) kept out.
 - `leader` — omit unless you need to override the session-context leader.
@@ -368,11 +421,20 @@ Before deciding, read the on-disk outputs — the debate writes three folders un
 Use `Read` / `Grep` against these paths plus the in-result snapshot to judge whether the debate outcome matches the design.
+For QA Brigade sessions, inspect whether the synthesis contains:
+- Participant list and assigned lenses.
+- Independent verdicts from each participant.
+- `ITEM-*` ledger status summary: accepted, excluded, superseded, needs_opinion, unresolved, and no_response items.
+- Consensus verdict and confidence.
+- Dissenting findings or minority concerns.
+- Evidence mapping back to design requirements, code locations, commands, reports, screenshots, or traces.
+- Clear distinction between QA-blocking failures, conditional concerns, and general review suggestions.
 #### 5M.4 Finalize (leader decision)
 Pick exactly one of the three follow-up tools, based on inspection:
-1. **Accept the outcome** → call `agent_debate_approve` with `session_id` and an optional `leader_note` (appended to the synthesis footer under "Leader approval notes"). The moderator writes the synthesis markdown, updates the session record to `approved`, and returns `synthesisDocPath`. Proceed to Phase 7 and relay the path to the user.
+1. **Accept the outcome** → call `agent_debate_approve` with `session_id` and an optional `leader_note` (appended to the synthesis footer under "Leader approval notes"). The moderator writes the synthesis markdown, updates the session record to `approved`, and returns `synthesisDocPath`. If this is QA-only, proceed to Phase 7. If this is an implementation flow and the QA verdict is PASS or CONDITIONAL PASS, proceed to Phase 6 unless the debate explicitly included the post-implementation review lens. If this is an implementation flow and the QA verdict is FAIL, return to Phase 3 with targeted fixes or escalate to the user instead of claiming completion.
 2. **Need more deliberation** → call `agent_debate_continue` with `session_id` and `additional_rounds` (`3`, `5`, or `10` only). The handler returns `status: running`; poll `agent_debate_status` again until it reaches the approval gate. Use this when the debate was close but unresolved, or when `escalated` came too early.
 3. **Reject the outcome** → call `agent_debate_reject` with `session_id` and a `reason` (captured in the transcript footer). Optionally set `spawn_issue: true` to write a lightweight issue branch document into `individual/` listing non-accepted proposals for later handling. No synthesis is produced. The debate is closed.
@@ -380,9 +442,9 @@ All three tools are idempotent on terminal states — re-calling returns the cac
 When the session is `escalated`, explain the situation to the user in supervised mode before choosing `continue` vs `reject`. In autonomous mode, prefer `continue` with `additional_rounds: 5` once; if it escalates again, `reject` with a clear reason and fall back to targeted fix tasks in Phase 3.
-### Phase 6: Post-implementation Review (Leader-host-only mode)
+### Phase 6: Post-implementation Review
-> Used when Work Mode in Phase 2 was **Leader-host only**. In Multi-AI mode, the structured debate in Phase 5M subsumes this review.
+> Used for implementation flows after QA passes when review was not already included in Phase 5M. Skip this phase for QA-only submode.
 Run the `agestra-reviewer` agent for review/critique:
@@ -406,8 +468,8 @@ Provide a clear summary to the user:
 - Task completion summary: total tasks, completed, failed, re-routed
 - What changed (files modified, features added)
 - Verification summary:
-  - Leader-host-only: QA depth, E2E status, QA report path, QA cycle count + what was auto-fixed, review report path, review verdict
-  - Multi-AI: structured debate outcome (`approved` / `rejected`, with round count), `auto_inject_specialists` state, final synthesis path (if approved) from `.agestra/workspace/synthesis/`, and links to the individual reviews under `.agestra/workspace/individual/` and the transcript under `.agestra/workspace/debates/`
+  - Host-only QA/review: QA depth, E2E status, QA report path, QA cycle count + what was auto-fixed, review report path, review verdict
+  - QA Brigade / configured-provider QA: host QA report path, E2E host-only status, participant list, assigned lenses, accepted ledger items, excluded ledger items, open/opinion items, consensus verdict, dissenting findings, structured debate outcome (`approved` / `rejected`, with round count), `auto_inject_specialists` state, final synthesis path (if approved) from `.agestra/workspace/synthesis/`, and links to the individual reviews under `.agestra/workspace/individual/` and the transcript under `.agestra/workspace/debates/`
 - Any issues found and how they were resolved
 </Workflow>

package/commands/implement.md CHANGED Viewed

@@ -45,7 +45,7 @@ Use AskUserQuestion to present the recommended routing in the user's language, o
 | Option | Condition | Description |
 |--------|-----------|-------------|
-| **Leader-host only** | Always | The current host delegates code changes to `agestra-implementer` and verifies locally |
+| **Leader-host only** | Always | The current host delegates code changes to `agestra-implementer`; QA still follows the configured-provider default unless host-only QA is requested |
 | **Suggested AI distribution** | team mode available | The leader proposes which enabled AIs should handle which tasks, asks for approval, then dispatches |
 If team mode is not available, skip the question and use Leader-host only.
@@ -67,6 +67,11 @@ Determine QA depth for the post-implementation verification:
 - **Full QA with E2E** when the user explicitly asks for E2E/runtime verification, or when the work is centered on UI flows, auth, file operations, public release, destructive actions, or complex state transitions.
 - If Full QA may require long setup, a dev server, browser automation, screenshots, or persistent E2E test files, explain the time/token cost and ask before enabling it.
+Determine QA routing separately from implementation routing:
+- When configured external providers are available, team-lead routes post-implementation QA through the QA Brigade, even if implementation itself used Leader-host-only mode.
+- If the user explicitly asks for host-only QA, or no external providers are available, use host-local QA only.
+- E2E/runtime execution is always host-owned. External providers may review the host QA report, command output, screenshots, traces, and E2E findings, but they must not run browser/dev-server flows or create persistent E2E files directly.
 ## Step 5: Execute via team-lead
 Spawn `agestra:agestra-team-lead` with a self-contained handoff packet. The team-lead agent is the single execution entry point — this command does NOT call `cli_worker_spawn`, `ai_chat`, `agent_debate_*`, or spawn `agestra-implementer` / `agestra-qa` directly.
@@ -80,6 +85,9 @@ Handoff packet:
 - **Design doc reference:** path under `docs/plans/` if Step 4 produced or referenced one
 - **Progress tracking:** implementers must update the design document's top-level Implementation Progress table with Planned / In Progress / Implemented / Verified / Blocked / Deferred status and evidence; they must not rewrite approved scope to hide incomplete work
 - **QA depth:** Standard QA / Full QA with E2E / Decide automatically, based on Step 4
+- **QA routing:** team-lead orchestrates the QA Brigade by default when external providers are available; host-only only when explicitly requested or unavailable
+- **QA formation:** host executable evidence lead + all configured and available review-capable providers with distinct QA lenses
+- **E2E/runtime execution:** host-owned only
 - **Available providers:** from `environment_check` / `provider_list`
 - **Requested providers:** explicit names captured from user wording; otherwise "all available"
 - **Locale:** from `setup_status`
@@ -90,7 +98,8 @@ Team-lead owns the rest:
 **Leader-host-only mode:**
 - Delegates code edits to `agestra:agestra-implementer`
-- Runs Phase 5 QA Cycle (`agestra:agestra-qa`) with auto-fix loop
+- Runs host-owned QA evidence collection (`agestra:agestra-qa`) with auto-fix loop when fixes are needed
+- Orchestrates the QA Brigade by default when external providers are available
 - Routes approved persistent E2E test work to `agestra:agestra-e2e-writer` only when QA requests it
 - Runs Phase 6 post-implementation review (`agestra:agestra-reviewer`) for critique, blast radius, AI-slop/cleanup notes, and blocking concerns
@@ -103,7 +112,7 @@ Team-lead owns the rest:
 **QA-only submode (`submode: qa-only`):**
 - Skips Phase 2/3/4 (no code changes)
-- Runs only Phase 5 (host-local QA) or Phase 5M (multi-AI QA debate) against existing code
+- Runs Phase 5M (QA Brigade) by default when providers are available; otherwise runs Phase 5 (host-local QA) against existing code
 - Returns PASS / CONDITIONAL / FAIL verdict — never spawns implementer or CLI workers
 - Exception: if QA returns `E2E_TEST_WORK_REQUEST`, ask the user whether to create or update persistent E2E tests. If approved, route only that packet to `agestra:agestra-e2e-writer` as a separate E2E test-writing task, then re-run QA.
@@ -116,6 +125,7 @@ When team-lead returns, surface:
 - QA report path under `docs/reports/qa/`
 - Test/build outcome (`qa_run` result if executed)
 - QA verdict (PASS / CONDITIONAL PASS / FAIL with classified failures if any)
+- QA Brigade participants, assigned lenses, accepted ledger items, excluded ledger items, open/opinion items, consensus, and notable dissenting findings when multi-AI QA ran
 - Review report path under `docs/reports/review/` and review verdict (APPROVE / APPROVE WITH CONCERNS / BLOCKING CONCERNS) when review ran
 - Synthesis paths under `.agestra/workspace/synthesis/` if structured debate ran
 - Communicate in the user's language

package/commands/qa.md CHANGED Viewed

@@ -40,13 +40,15 @@ Ask the user once:
 If the user chooses Full QA and persistent E2E test files must be added or updated, QA must ask approval and route test-file work to `agestra-e2e-writer`. QA itself remains read-only for source code and persistent tests.
+Even in multi-AI QA, E2E/runtime execution is host-owned. External providers may review the design, code, host QA report, command output, screenshots, traces, and E2E findings, but they must not run browser/dev-server flows or create persistent E2E files directly.
 QA writes a Markdown report under `docs/reports/qa/` unless the user explicitly asks for chat-only output.
 ## Step 3: Route execution
 Call `environment_check` and `provider_list`.
-**Branch A — No external providers available or no multi-AI request:**
+**Branch A — No external providers available, or the user explicitly requested host-only QA:**
 Spawn `agestra:agestra-qa` host specialist directly with:
 - QA target
 - Design document path
@@ -55,22 +57,26 @@ Spawn `agestra:agestra-qa` host specialist directly with:
 - Report artifact path expectation: `docs/reports/qa/YYYY-MM-DD-qa-[target].md`
 - Locale
-**Branch B — External providers requested or multi-AI QA requested:**
+**Branch B — 1+ configured external providers available (default QA Brigade):**
 Hand off to `agestra:agestra-team-lead` with:
 - **Domain:** `qa`
 - **Submode:** `qa-only`
 - **Mode:** `multi-ai`
+- **QA formation:** QA Brigade
 - **QA target:** from Step 1
 - **QA depth:** Standard QA / Full QA with E2E / Decide automatically
+- **E2E/runtime execution:** host-owned only; external providers cross-validate artifacts and findings, not browser/dev-server execution
 - **Design doc reference:** path under `docs/plans/`
 - **Report artifact path expectation:** `docs/reports/qa/YYYY-MM-DD-qa-[target].md`
 - **Available providers:** from `environment_check`, exclude `ollama` unless explicitly requested for lightweight cross-checking
-- **Requested providers:** explicit names captured from user wording; otherwise "all available review-capable"
+- **Requested providers:** explicit names captured from user wording; otherwise "all configured and available review-capable providers"
+- **Brigade lenses:** host executable evidence, spec-to-code compliance, implementation progress truthfulness, integration/regression risk, edge/error states, test adequacy, basic safety hygiene, and E2E artifact review when E2E ran
+- **JSON finding flow:** candidate findings become `ITEM-*` ledger items; participants use the existing `agree` / `disagree` / `opinion` / `revise` stance contract; only ledger-accepted items affect the final verdict
 - **Locale:** from `setup_status`
 - **Original user request:** preserve verbatim
-Team-lead owns cross-provider QA debate and aggregation. This command must not call `agent_debate_structured` directly.
+Team-lead owns the QA Brigade handoff and leader approval gate. The moderator engine owns provider fan-out, `ITEM-*` creation, JSON stance turns, consensus ledger aggregation, minority/open items, and synthesis after approval. This command must not call `agent_debate_structured` directly. Do not ask for a separate multi-AI confirmation in Branch B; provider selection already came from setup. Honor explicit host-only wording.
 ## Step 4: Present the final result
@@ -79,6 +85,7 @@ When QA returns:
 - Link or name the design document used
 - Link the QA report artifact under `docs/reports/qa/`
 - Show PASS / CONDITIONAL PASS / FAIL
+- In QA Brigade mode, summarize participants, assigned lenses, accepted ledger items, excluded ledger items, open/opinion items, consensus, and notable dissenting findings
 - Summarize progress-table mismatches, design gaps, build/test failures, E2E failures, and basic safety hygiene risks
 - If QA returned `E2E_TEST_WORK_REQUEST`, ask the user whether to create or update persistent E2E tests. If approved, route the request to `agestra:agestra-e2e-writer` or team-lead as a separate E2E test-writing task, then re-run QA after tests exist. If declined, record E2E as residual risk.
 - Recommend `/agestra review` for critique or `/agestra security` for dedicated security audit when needed