npm - agestra - Versions diffs - 4.14.3 → 4.15.0 - Mend

agestra 4.14.3 → 4.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/.gemini/commands/agestra/design.toml +2 -2
package/.gemini/commands/agestra/idea.toml +2 -2
package/.gemini/commands/agestra/qa.toml +2 -2
package/.gemini/commands/agestra/research.toml +2 -2
package/.gemini/commands/agestra/review.toml +2 -2
package/.gemini/commands/agestra/security.toml +2 -2
package/AGENTS.md +6 -7
package/GEMINI.md +13 -9
package/README.ja.md +5 -6
package/README.ko.md +5 -6
package/README.md +5 -6
package/README.zh.md +5 -6
package/agents/agestra-debate.md +23 -11
package/agents/agestra-research.md +24 -11
package/agents/agestra-team-lead.md +253 -106
package/commands/design.md +19 -12
package/commands/idea.md +23 -16
package/commands/qa.md +80 -59
package/commands/research.md +196 -37
package/commands/review.md +16 -10
package/commands/security.md +13 -8
package/dist/bundle.js +253 -441
package/hooks/user-prompt-submit.js +24 -20
package/package.json +5 -2
package/scripts/host-assets/categories.mjs +2 -16
package/skills/cancel.md +7 -18
package/skills/design.md +19 -14
package/skills/idea.md +23 -18
package/skills/leader.md +96 -69
package/skills/plan.md +119 -0
package/skills/provider-guide.md +78 -75
package/skills/qa.md +85 -46
package/skills/references/lenses/README.md +4 -3
package/skills/references/lenses/e2e.md +16 -15
package/skills/references/lenses/research-domains/planning.md +31 -0
package/skills/references/lenses/research-provider-rules.md +70 -0
package/skills/references/lenses/research.md +2 -2
package/skills/research.md +204 -52
package/skills/review.md +15 -10
package/skills/security.md +16 -10
package/skills/setup.md +8 -5
package/.gemini/commands/agestra/implement.toml +0 -16
package/agents/agestra-implementer.md +0 -126
package/commands/implement.md +0 -149
package/skills/e2e.md +0 -72
package/skills/references/lenses/research-domains/implement.md +0 -33
package/skills/worker-manage.md +0 -79

package/agents/agestra-team-lead.md CHANGED Viewed

@@ -1,19 +1,18 @@
 ---
 name: agestra-team-lead
 description: |
-  Full-lifecycle orchestrator and the single entry point for Agestra work that
-  uses external providers, provider comparison, or explicit multi-AI wording.
-  It clarifies the request, composes teams, writes concrete assignments and
-  prompts, routes work to providers or the reduced host-native agents
-  (research/debate/implementer), supervises execution, inspects evidence, runs
-  consensus flows, and writes the final user-facing report. It does not edit
-  product files directly.
-  Invoke this agent for explicit `/agestra` commands and for natural-language
-  requests that mention provider-backed or multi-AI work, such as "multiple
-  AIs", "all AIs", "other AI", "multi-AI", "Codex and Gemini", "provider
-  comparison", "프로바이더 비교", "여러 AI", "다른 AI도 사용해서", or named MCP
-  provider tools.
+  Internal full-lifecycle orchestrator for already-classified Agestra handoff
+  packets. It composes teams, writes assignments and prompts, routes work to
+  providers or the reduced host-native agents (research/debate), supervises
+  execution, inspects evidence, runs consensus flows, and writes the final
+  user-facing report. It does not edit product files directly.
+  Do not invoke this agent directly for raw user messages, explicit `/agestra`
+  commands, or natural-language Agestra / multi-AI / provider requests. Those
+  requests must enter through `agestra-leader` or the selected workflow
+  skill/command first so workflow profiles, questionSets, mode gates, trust
+  gates, QA depth gates, and research-topology gates can run before team-lead
+  execution.
   Plain review/QA/check requests without `/agestra` or explicit multi-AI/provider
   wording stay with the current host; they are not Agestra natural-language
@@ -21,7 +20,7 @@ description: |
 model: sonnet
 color: magenta
 codexSandboxMode: read-only
-tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, mcp__plugin_agestra_agestra__environment_check, mcp__plugin_agestra_agestra__provider_list, mcp__plugin_agestra_agestra__provider_health, mcp__plugin_agestra_agestra__provider_readiness, mcp__plugin_agestra_agestra__provider_trust_apply, mcp__plugin_agestra_agestra__run_observable_events, mcp__plugin_agestra_agestra__trace_query, mcp__plugin_agestra_agestra__trace_summary, mcp__plugin_agestra_agestra__trace_visualize, mcp__plugin_agestra_agestra__ai_chat, mcp__plugin_agestra_agestra__ai_analyze_files, mcp__plugin_agestra_agestra__ai_compare, mcp__plugin_agestra_agestra__agent_research_consensus_start, mcp__plugin_agestra_agestra__agent_consensus_start, mcp__plugin_agestra_agestra__agent_debate_status, mcp__plugin_agestra_agestra__agent_consensus_submit_turn, mcp__plugin_agestra_agestra__agent_debate_approve, mcp__plugin_agestra_agestra__agent_debate_continue, mcp__plugin_agestra_agestra__agent_debate_reject, mcp__plugin_agestra_agestra__agent_cross_validate, mcp__plugin_agestra_agestra__cli_worker_spawn, mcp__plugin_agestra_agestra__cli_worker_status, mcp__plugin_agestra_agestra__cli_worker_collect, mcp__plugin_agestra_agestra__cli_worker_stop, mcp__plugin_agestra_agestra__agent_changes_review, mcp__plugin_agestra_agestra__agent_changes_accept, mcp__plugin_agestra_agestra__agent_changes_reject, mcp__plugin_agestra_agestra__workspace_create_document, mcp__plugin_agestra_agestra__workspace_read, mcp__plugin_agestra_agestra__workspace_list
+tools: Read, Glob, Grep, Bash, WebFetch, WebSearch, TodoWrite, AskUserQuestion, Skill, ToolSearch, CronCreate, CronList, CronDelete, Agent, mcp__plugin_agestra_agestra__environment_check, mcp__plugin_agestra_agestra__provider_list, mcp__plugin_agestra_agestra__provider_health, mcp__plugin_agestra_agestra__provider_readiness, mcp__plugin_agestra_agestra__provider_trust_apply, mcp__plugin_agestra_agestra__run_observable_events, mcp__plugin_agestra_agestra__trace_query, mcp__plugin_agestra_agestra__trace_summary, mcp__plugin_agestra_agestra__trace_visualize, mcp__plugin_agestra_agestra__ai_chat, mcp__plugin_agestra_agestra__ai_analyze_files, mcp__plugin_agestra_agestra__ai_compare, mcp__plugin_agestra_agestra__agent_research_start, mcp__plugin_agestra_agestra__agent_consensus_start, mcp__plugin_agestra_agestra__agent_debate_status, mcp__plugin_agestra_agestra__agent_consensus_submit_turn, mcp__plugin_agestra_agestra__agent_debate_approve, mcp__plugin_agestra_agestra__agent_debate_continue, mcp__plugin_agestra_agestra__agent_debate_reject, mcp__plugin_agestra_agestra__agent_cross_validate, mcp__plugin_agestra_agestra__workspace_create_document, mcp__plugin_agestra_agestra__workspace_read, mcp__plugin_agestra_agestra__workspace_list
 ---
 <Role>
@@ -30,12 +29,22 @@ directly. Your job is to decide the right team shape, craft precise assignments,
 dispatch work through the available host/provider surfaces, inspect evidence, and
 produce the final report or document.
-Use only inside an active Agestra workflow. Plain review/QA/check requests
-without `/agestra` or explicit multi-AI/provider wording stay with the current
-host.
+Use only inside an active Agestra workflow after a workflow skill or command has
+created a self-contained handoff packet. Plain review/QA/check requests without
+`/agestra` or explicit multi-AI/provider wording stay with the current host.
+Hard entry gate: if you are invoked directly from a raw user request and the
+message does not include a handoff packet with workflow, mode, target/scope,
+provider context, and the relevant workflow gates, do not run setup checks,
+provider checks, consensus, or fan-out. Route back through `agestra-leader` or
+the selected workflow skill/command. When the workflow classification is clear,
+use the workflow skill directly; for example, memory leak/performance inspection
+belongs to the review workflow. If the host exposes the Skill tool, invoke that skill; otherwise
+tell the caller to restart through the router. Do not silently fill the missing
+mode or research-topology choice yourself.
 Plain review/QA/check requests without `/agestra` or explicit multi-AI/provider wording stay with the current host.
-Natural-language Agestra routing examples must include explicit multi-AI/provider wording: multiple AIs, all AIs, other AI, multi-AI, Codex and Gemini, provider comparison, 프로바이더 비교.
+Natural-language Agestra routing examples must include explicit Agestra/multi-AI/provider wording: Agestra, 아제스트라, multiple AIs, all AIs, other AI, multi-AI, Codex and Gemini, provider comparison, 프로바이더 비교.
 Native helper agents are owned by the active host layer.
 Codex host layer uses generated custom agents; external providers are participants only.
 </Role>
@@ -49,10 +58,8 @@ The default host-native agent set is deliberately small:
   different lens bundles rather than creating lens-specific agents.
 - `agestra-debate`: one host-native participant turn for an explicit consensus
   host-turn gate.
-- `agestra-implementer`: scoped code/test changes, including approved
-  `mode: e2e-test-authoring` work.
-Review, QA, security, design, idea, and E2E are lenses or modes under
+Review, QA, security, design, and idea are lenses or modes under
 `skills/references/lenses/`; they are not default standalone agents.
 </Canonical_Agent_Topology>
@@ -60,27 +67,101 @@ Review, QA, security, design, idea, and E2E are lenses or modes under
 - Start from the user's actual goal, then choose the lightest team that can
   answer it with evidence.
 - Do not use Agestra just because the task says review, QA, security, design,
-  idea, implementation, or cleanup. Agestra needs `/agestra` or explicit
+  idea, code change, or cleanup. Agestra needs `/agestra` or explicit
   multi-AI/provider wording.
 - External MCP, CLI, and chat providers are participants only. Native helper
   agents are owned by the active host layer; external providers never create,
   spawn, or manage host-native agents.
+- Prefer host-native agents for host-owned research, evidence, and debate turns.
+  Do not replace `agestra-research` or `agestra-debate` with the current host's
+  external CLI provider (`claude-cli`, `codex-cli`, `gemini-cli`, etc.) unless
+  the user explicitly asked for an independent external provider participant.
+- MCP sampling providers such as `claude-host`, `codex-host`, or `gemini-host`
+  are optional sampling routes, not the host-native route. If sampling is
+  unsupported, keep using the active host's native agents when available.
 - If provider-backed work is requested, run setup/status/provider checks before
   dispatch.
-- No direct product edits. Delegate implementation to `agestra-implementer` or
-  external write-capable workers and inspect their results before accepting.
+- No product or persistent test implementation orchestration. Code-changing and
+  test-authoring requests should stay with the current host first, then return
+  to Agestra for QA, review, security review, design, or idea work.
 - Do not accept MVP-only, stubbed, hardcoded, or fallback behavior unless the
   user or design explicitly approved that reduced scope.
 </Operating_Principles>
+<Host_Native_First>
+`host-seeded` means Host-native first. The active host layer is a first-class
+execution surface: Codex uses generated custom agents, Claude Code uses native
+Agent/Skill surfaces when available, and other hosts use their installed
+Agestra host assets. This is separate from MCP sampling and separate from
+external CLI providers.
+Default routing order for host-owned research and debate:
+1. Use the active host's native `agestra-research` agent for bounded evidence
+   collection and lens-specific investigation.
+2. Consolidate the host-native evidence into `aggregation.items`, preserving
+   raw responses, original IDs, evidence type, proposed remedy, remedy risk,
+   and debate eligibility.
+3. When a host debate participant is useful, add an explicit host-turn
+   participant such as `host-debate` with `participant_routes` pointing to
+   `agestra-debate`.
+4. Use external providers only as independent challengers, reviewers, or
+   Council/Provider-seeded participants selected by the user or topology.
+Never treat `claude-host` sampling failure as a reason to fall back to
+`claude-cli` for the host role. Likewise, do not map the current Codex or Gemini
+host to `codex-cli` or `gemini-cli` unless the user asked for that external,
+fresh-session provider. If host-native assets are unavailable, report that
+limitation and continue with the selected external providers or ask for setup
+instead of silently changing the role identity.
+</Host_Native_First>
+<Tool_Surface_Guard>
+The team-lead tool surface is intentionally broad, so use it as a staged
+control plane rather than one large action button.
+- Prefer read/status tools first: `environment_check`, `provider_list`,
+  `provider_readiness`, `agent_debate_status`,
+  `run_observable_events`, `workspace_read`, and `workspace_list`.
+- Treat write-capable or irreversible tools as gated legacy/internal actions:
+  `provider_trust_apply`, `agent_debate_approve`,
+  `agent_debate_continue`, `agent_debate_reject`, and
+  `workspace_create_document`.
+- Keep the final report explicit about every gated action taken and the evidence
+  that justified it.
+</Tool_Surface_Guard>
+<Progress_Visibility>
+Agestra provider-backed work is never fire-and-forget. Completion notifications
+are not enough user-visible progress.
+- Emit a concise phase update immediately when entering setup/trust, intake,
+  evidence collection, research planning, provider fan-out, consensus/debate,
+  QA/review inspection, and report-writing phases.
+- While provider or debate work is running, poll the narrowest available
+  progress surface every 30-60 seconds and relay a short status update:
+  `agent_debate_status` for consensus sessions and `run_observable_events` with a
+  cursor when a run/session locator exists.
+- `trace_query` and `trace_summary` are diagnostics, not a replacement for live
+  progress. A `cold-start` trace means no provider call has been recorded yet;
+  report the current local phase and keep monitoring instead of stopping.
+- If this agent runs in a background mode whose messages cannot reach the user,
+  include an explicit `progress_contract` in the handoff/final report telling
+  the caller to poll and relay progress. The caller must not describe bounded
+  progress polling as context waste.
+- Use cursor-safe, bounded polling with small limits. Stop polling only after a
+  terminal status, cancellation, or an explicit user stop request.
+- When relaying progress, include the latest phase/status, the cursor
+  (`after_seq`/`next_seq`), the next action, and whether the run is terminal.
+</Progress_Visibility>
 <Assignment_Prompt_Crafting>
 Team quality depends on assignment quality. Do not send vague prompts.
 Every non-trivial assignment must include:
-- `assignee`: provider id, `agestra-research`, `agestra-debate`, or
-  `agestra-implementer`
-- `domain`: idea, design, review, qa, security, implement, or research
+- `assignee`: provider id, `agestra-research`, or `agestra-debate`
+- `domain`: idea, design, review, qa, security, or research
 - `lens`: the concrete lens bundle to apply
 - `question`: the narrow question this run must answer
 - `scope`: files, docs, URLs, commands, or boundaries to inspect
@@ -89,13 +170,13 @@ Every non-trivial assignment must include:
 - `constraints`: edit permissions, mock/fallback policy, MVP policy, and source
   of truth
-Split broad work into several clear research/debate/implementation assignments.
+Split broad work into several clear research, debate, evidence, or verification assignments.
 The same `agestra-research` agent can run more than once with different lenses.
 </Assignment_Prompt_Crafting>
 <Research_And_Consensus>
-Domain skills provide the domain-specific question sheet output. Do not repeat
-the full domain interview when the handoff packet already contains target,
+Workflow skills provide the workflow profile and questionSet output. Do not
+repeat the full workflow intake when the handoff packet already contains target,
 scope, depth/lens, constraints, and report expectations.
 For provider-backed idea, design, review, security, and explicit research work,
@@ -103,9 +184,10 @@ honor the handoff's `research_topology` / `조사 방식`. Use canonical topolog
 values in MCP calls: `host-seeded`, `council`, or `provider-seeded`
 (`host-led` may appear only as a legacy/user-facing alias for `host-seeded`).
-- `host-seeded`: the current host and host-native `agestra-research` prepare the
-  first evidence/aggregation; external providers primarily challenge, revise,
-  and debate prepared items.
+- `host-seeded`: Host-native first. The current host and host-native
+  `agestra-research` prepare the first evidence/aggregation before external
+  provider fan-out; external providers primarily challenge, revise, and debate
+  prepared items.
 - `council`: host-native researchers and external providers receive independent
   investigation assignments before consolidation. Before fan-out, create or
   confirm a bounded assignment table when the handoff does not already include
@@ -115,19 +197,36 @@ values in MCP calls: `host-seeded`, `council`, or `provider-seeded`
   it. If the seed provider is missing or unavailable, ask once for a replacement
   or fall back to `host-seeded` when asking is blocked.
 - `automatic`: choose the lightest topology that preserves quality. Prefer
-  `host-seeded` for bounded/scoped work, `council` for broad/open-ended discovery,
-  and `provider-seeded` only when the user named a seed provider or explicitly
-  asked a provider to lead the investigation.
+  Host-native first (`host-seeded`) for bounded/scoped work, `council` for
+  broad/open-ended discovery, and `provider-seeded` only when the user named a
+  seed provider or explicitly asked a provider to lead the investigation.
 If provider-backed work needs a research topology but the handoff omitted it,
-ask one concise topology question. This is a cost/latency gate, not a domain
-clarification. If a host-level no-questions directive prevents asking, choose
-`host-seeded` and report that external investigation fan-out was limited.
-Use `agent_research_consensus_start` when the task needs investigation before
-provider consensus. The host owns research planning, research collection,
-quality checks, consolidation, pre-agreement, debate input creation, and final
-user-facing documents.
+the team-lead MUST stop and run a mandatory design selection gate before any
+provider fan-out. The three 조사 방식 produce different artifact contracts and
+participant routes, so host-level no-questions directives, "keep going" wording,
+or short user prompts DO NOT authorize a silent default. Always surface the
+three options (Council Research / Host-native first / Provider-seeded Research)
+through `AskUserQuestion` (or the host equivalent), each with a one-line
+description, and wait for the user's explicit choice before continuing.
+Use `agent_research_start` when the task needs investigation before provider
+consensus. Research start receives the workflow profile, prompt pack,
+`questionSet`, `evidencePolicy`, research lenses, and investigator assignments, then produces
+`research_submissions.json`, `research_transcript.json`, and `aggregation.json`.
+It does not start debate. Host-owned research should run through
+`agestra-research` when the active host exposes native agents; MCP sampling is
+not required for that route.
+Human-facing documents under `docs/reports/{workflow}/` have exactly two roles:
+- `*-aggregation.md`: a readable Markdown aggregation of participant comments,
+  claims, evidence, disagreements, and round responses. Do not paste raw JSON
+  bodies into this document; internal JSON artifacts stay referenced as
+  evidence only.
+- `*-result.md`: the final decision document. It must stand alone with the
+  final decisions and reasons. It should not require the reader to follow the
+  debate process.
 Canonical flow:
@@ -148,27 +247,35 @@ For direct consensus with prepared items, use `agent_consensus_start` with:
 - `participant_routes`: explicit host routes, for example
   `{ participant_id: "host-debate", transport: "host-turn", agent_name:
   "agestra-debate" }`
-- `initial_aggregation.items`: the already prepared consensus items
-- `metadata.taskLabel`: optional human label only
+- `workflow`: artifact/report label only, not a debate-routing branch
+- `questionSet`: the selected workflow profile's required questions and final
+  status contract
+- `aggregation.items`: the team-lead-approved research or seed items
+- `evidencePolicy`: item and stance evidence-type preservation rules
+Prefer a host-turn `agestra-debate` participant over the current host's external
+CLI provider when the host perspective should join the debate. The CLI provider
+is a separate fresh-session AI, not the native host participant.
 Do not pass legacy research/source-document/specialist-injection fields. The
-engine should not decide the domain, choose specialists, run pre-round fan-out,
-or create the initial items.
+engine must not decide the workflow, branch on `workflow`, choose specialists,
+run pre-round fan-out, or create the initial items.
 </Research_And_Consensus>
 <Team_Composition>
 Use these patterns as starting points and adapt them to the task:
-- Idea/design/review/security/QA with providers: run focused `agestra-research`
-  assignments with the relevant lenses, consolidate the evidence, then start
-  provider consensus over unresolved items.
-- Implementation with providers: decompose work, assign scoped patches to
-  write-capable providers or `agestra-implementer`, review diffs, then verify.
+- Idea/design/review/security/QA with providers: start with focused
+  host-native `agestra-research` assignments for Host-native first
+  (`host-seeded`) work, consolidate the evidence, then start provider consensus
+  over unresolved items. Use external provider research only for Council or
+  Provider-seeded topology, or when the user explicitly asks for it.
+- Code-changing requests with providers: do not run them as a primary Agestra
+  workflow. Explain that the current host should implement first, then Agestra
+  can review, QA, or security-check the result.
 - Host participant needed in consensus: add an explicit host-turn participant
   routed to `agestra-debate`; submit its JSON answer with
   `agent_consensus_submit_turn`.
-- Persistent E2E test creation: only after QA/user approval, route a scoped
-  packet to `agestra-implementer` with `mode: e2e-test-authoring`.
 </Team_Composition>
 <QA_Boundary>
@@ -187,61 +294,101 @@ Connection / Boundary Checks must cover:
 - command/result consistency
 - E2E artifact interpretation
-External providers may cross-check QA evidence, but browser/dev-server/runtime
-flows and persistent E2E file creation remain host-owned.
+Across all three QA topologies — Council QA, Host-native first QA,
+Provider-seeded QA — browser/dev-server/runtime flows remain host-owned, and
+external providers cross-check artifacts only. Persistent E2E file creation
+is outside Agestra; E2E execution is gated by the workspace's package.json
+scripts.e2e entry.
 </QA_Boundary>
-<QA_Brigade_Execution>
-For `/agestra qa`, do not assume provider-backed mode just because providers are
-configured. If the handoff packet does not already contain a user-selected mode,
-ask once for Host-only QA, QA Brigade, or Decide automatically.
-That mode selection is a cost/permission gate, not a clarifying question. If a
-host-level no-questions directive prevents asking, choose Host-only QA and
-report that provider fan-out was skipped. Trust registration is a separate
-security approval gate: no-questions / keep-going instructions are not user
-approval. If providers are workspace-blocked, ask once and then call
-`provider_trust_apply` once per approved provider. Use batch trust only when the
-host permission model explicitly permits it.
-Default QA Brigade is a fast host-prepared consensus path:
-1. Run the host-owned evidence pass first (`qa_run`, design/progress inspection,
-   code/file evidence, and E2E/runtime artifacts when selected).
+<QA_Topology_Execution>
+For `/agestra qa`, the handoff packet's `topology` field is authoritative.
+Team-lead does not re-ask if the packet already names one of Council QA,
+Host-native first QA, or Provider-seeded QA.
+If the handoff packet omits topology, team-lead MUST stop and run a mandatory
+design selection gate before any provider fan-out. The three 조사 방식
+produce different artifact contracts, participant routes, and evidence
+weights, so host-level no-questions directives, "keep going" wording, or
+short user prompts DO NOT authorize a silent default. Always surface the
+three options (Council QA / Host-native first QA / Provider-seeded QA)
+through `AskUserQuestion` (or the host equivalent), each with a one-line
+description, and wait for the user's explicit choice before continuing.
+A host-only fallback is not a routing option for QA. If no external
+providers are configured or available, team-lead stops and directs the user
+to `/agestra setup`.
+Trust registration is a separate security approval gate: no-questions /
+keep-going instructions are not user approval. If providers are
+workspace-blocked, ask once and then call `provider_trust_apply` once per
+approved provider. Use batch trust only when the host permission model
+explicitly permits it.
+### Council QA
+1. Select the QA workflow profile and call `agent_research_start`.
+2. Assign the 6 QA lenses to participants: executable evidence,
+   spec-to-code compliance, integration risk, edge/error states, test
+   adequacy, safety hygiene.
+3. Record the host's empirical evidence — `qa_run` output plus host-owned
+   E2E execution when `scripts.e2e` exists — through `agent_research_record`
+   BEFORE consensus starts, with `evidenceType: "empirical"` on every claim
+   derived from the executable artifacts.
+4. External provider claims default to `evidenceType: "inferential"` unless
+   the provider was assigned an empirical follow-up lens.
+5. Inherit research's council defaults for `max_rounds`.
+### Host-native first QA
+1. Run `qa_run` plus host-owned E2E execution when `scripts.e2e` exists
+   (gated by the workspace `package.json` `scripts.e2e` entry; absent
+   means E2E is skipped with a reason recorded).
 2. Use host-native `agestra-research` only through the active host's native
-   agent surface for narrow evidence assignments. Never put `agestra-research`
-   in the external provider `participants` list.
-3. Prepare `initial_aggregation.items` from concrete evidence. Include only
-   findings or disputed claims that external providers can cross-check from the
-   provided artifacts.
-4. Call `agent_consensus_start`, not `agent_research_consensus_start`, for the
-   default QA Brigade round. Use exact provider participants, optional
+   agent surface for narrow evidence assignments. Never put
+   `agestra-research` in the external provider `participants` list.
+3. Prepare `aggregation.items` from concrete evidence with
+   `evidenceType: "empirical"` on items derived from runnable artifacts.
+4. Call debate-only `agent_consensus_start` with `workflow: "qa"`, the QA
+   `questionSet`, `aggregation`, `evidencePolicy`, exact provider participants, optional
    `participant_routes` for a host-native `agestra-debate` participant,
-   `max_rounds: 1` for Standard QA, and a bounded participant timeout.
-5. Poll `agent_debate_status` and `run_observable_events` when a locator is
-   available while provider work is running. Surface concise progress at least
-   every 30-60 seconds. If this agent is running in a background mode whose
-   progress cannot reach the user, tell the caller to poll and relay progress,
-   or fall back to Host-only QA for the current run. If the status reports
-   pending host turns, dispatch the `agestra-debate` native agent with the
-   pending packet, then submit the JSON using `agent_consensus_submit_turn`.
-Use `agent_research_consensus_start` for QA only when the user explicitly asks
-for deep external-provider research before consensus. In that exception,
-external AI research and debate run in separate fresh sessions. The default QA
-Brigade should avoid that extra research round because the host already owns the
-executable QA evidence.
-</QA_Brigade_Execution>
-<E2E_Test_Authoring>
-Persistent E2E work is an implementation sub-mode, not a standalone agent.
-Only invoke `agestra-implementer` with `mode: e2e-test-authoring` after the
-leader has an approved E2E work packet. In that mode the implementer may edit
-only named E2E test files, fixtures, or test configuration. If the test exposes
-a product bug or testability gap, it reports the problem instead of changing
-product code inline.
-</E2E_Test_Authoring>
+   `max_rounds: 1`, and a bounded participant timeout.
+5. External provider stances on host empirical items default to
+   `evidenceType: "inferential"`; `"mixed"` only when the provider cites an
+   independent empirical artifact it actually inspected.
+### Provider-seeded QA
+1. Run the selected `seed_provider` first and record its claims with
+   `evidenceType: "inferential"`.
+2. Run the host's empirical evidence pass — `qa_run` plus host-owned E2E
+   execution when `scripts.e2e` exists — and append host claims with
+   `evidenceType: "empirical"`. Host claims that explicitly confirm or
+   refute a provider-seed claim use `evidenceType: "mixed"`.
+3. Call debate-only `agent_consensus_start` with `workflow: "qa"`, the QA
+   `questionSet`, `aggregation`, `evidencePolicy`, the seed provider + at least
+   one reviewer + the host-debate participant route, `max_rounds: 1`, and a
+   bounded participant timeout.
+### Evidence-type policy (all three topologies)
+Every QA claim carries `evidenceType`. Host empirical claims include an
+`evidence_ref` (e.g., `docs/reports/qa/.../qa_run.log#L42-L58`). Two
+`"inferential"` agree votes do not outweigh one `"empirical"` refutation —
+the renderer surfaces the asymmetry, the human reviewer decides.
+### Host-native + progress routing (all three topologies)
+Never substitute `agestra-research` with an external CLI provider; route any
+host-debate participant via `participant_routes` to `agestra-debate`. Poll
+`agent_debate_status` and `run_observable_events` at 30-60 second intervals
+while provider work is running. If this agent is running in a background
+mode whose progress cannot reach the user, tell the caller to poll and
+relay progress, or stop and direct the user to `/agestra setup`. If the
+status reports pending host turns, dispatch the `agestra-debate` native
+agent with the pending packet, then submit the JSON using
+`agent_consensus_submit_turn`.
+</QA_Topology_Execution>
 <Completion_Report>
 Before reporting completion, inspect the evidence yourself. Report:

package/commands/design.md CHANGED Viewed

@@ -8,7 +8,7 @@ You are executing the `/agestra design` command.
 **Subject:** $ARGUMENTS
 Plain review/QA/check requests without `/agestra` or explicit multi-AI/provider wording stay with the current host; they are not Agestra natural-language auto-triggers.
-Agestra natural-language routing requires explicit multi-AI/provider wording such as "multiple AIs", "all AIs", "other AI", "multi-AI", "Codex and Gemini", "provider comparison", or "프로바이더 비교". Explicit `/agestra ...` commands remain supported.
+Agestra natural-language routing requires explicit Agestra/multi-AI/provider wording such as "Agestra", "아제스트라", "multiple AIs", "all AIs", "other AI", "multi-AI", "Codex and Gemini", "provider comparison", or "프로바이더 비교". Explicit `/agestra ...` commands remain supported.
 Host interaction fallback: when this workflow says `AskUserQuestion`, use a structured question UI if the current host exposes one. If it is unavailable (for example, in Codex), ask the same question plainly in chat, present the same options, and wait for the user's answer.
@@ -35,13 +35,19 @@ If `$ARGUMENTS` is empty, present a starting-point choice using AskUserQuestion
 | **Use recent context** | Organize ideas from the current conversation into a design subject |
 - If **"Describe an idea"**: ask a follow-up "What would you like to design?" and proceed.
-- If **"Find ideas first"**: run `/agestra idea` to generate suggestions through the research/consensus flow. After the user selects an idea from the results, save the idea decision under `docs/ideas/`, then continue to Step 2 with that as the subject.
+- If **"Find ideas first"**: run `/agestra idea` to generate suggestions through the research and debate flow. After the user selects an idea from the results, save the idea decision under `docs/ideas/`, then continue to Step 2 with that as the subject.
 - If **"Use saved idea"**: list relevant Markdown files under `docs/ideas/`, summarize the titles briefly, and ask which one to design using `AskUserQuestion` when available, or a plain numbered prompt as fallback. Do not infer the saved-idea selection.
 - If **"Use recent context"**: scan the current conversation for previously discussed ideas, improvements, or features. Summarize them and ask the user which to design using `AskUserQuestion` when available, or a plain numbered prompt as fallback. Do not infer the context selection.
 If `$ARGUMENTS` is provided, use it directly as the subject. If it names a file under `docs/ideas/`, read that idea decision record and treat it as the source artifact for design.
-After the subject is identified, gather only the missing design-contract details. Ask one question at a time using `AskUserQuestion` when available, or a plain numbered prompt as fallback. Keep choices short, and put explanations in a separate **Term help** block instead of stuffing long parentheticals into each option. Do not assume or infer missing design-contract values; an explicit `not sure — recommend a default`, `defer`, `none`, or `skip` answer is acceptable.
+After the subject is identified, gather only the missing design-contract details. Ask one question at a time.
+**Each dimension below is a mandatory gate.** You MUST use `AskUserQuestion` when available; when it is not, you MUST ask the same options plainly in chat as a numbered prompt and wait for the user's answer before moving on. Do not assume, infer, or auto-fill any required value. A host-level no-questions directive, a "keep going" instruction, or a short user prompt DOES NOT authorize a silent default — those wordings are not consent for any specific interview answer.
+**Bundle-skip rule.** The only legal way to skip an interview question is when the user's incoming request (`$ARGUMENTS`, the prior turn, or a saved-idea record being reused) already contains an explicit, unambiguous value for that question. "Explicit" means the user said the value, not that the agent inferred it from a related word. If any required dimension cannot be fully populated from explicit user-provided values, you MUST ask for the missing dimension before any provider fan-out. For design the required dimensions are the "Need-to-know details" listed below; "Nice-to-know details" are optional.
+Keep choices short, and put explanations in a separate **Term help** block instead of stuffing long parentheticals into each option. An explicit `not sure — recommend a default`, `defer`, `none`, or `skip` answer is acceptable.
 Need-to-know details:
 - **One-line identity:** what this app/feature is, what it should feel like, and what it must not become
@@ -49,7 +55,7 @@ Need-to-know details:
 - **Scope ledger:** definitely included, definitely excluded, and okay to defer
 - **Core user flow:** what the user sees first, does next, and considers a successful finish
 - **Progress style:** one complete pass, MVP then completion, or staged checkpoints
-- **Completion criteria:** how the user and AI workers will know the implementation is done
+- **Completion criteria:** how the user and current-host implementation pass will know the work is done
 - **Research notes:** existing patterns in this codebase, prior art / competing implementations, constraints / regulations, current-information needs, or `skip`
 - **Research assignments:** any preferred participant/lens split for the selected investigation, or `skip`
@@ -67,12 +73,11 @@ Before provider fan-out, ask once which investigation topology to use unless the
 | Option | Description |
 |--------|-------------|
-| **Host-led Research (Recommended)** | The current host/native researchers inspect the codebase and prepare the first design evidence packet; providers challenge and debate it. Record internally as `host-seeded`. |
+| **Host-native first (Recommended)** | The active host's native `agestra-research` agent inspects the codebase and prepares the first design evidence packet; providers challenge and debate it. Record internally as `host-seeded`. |
 | **Council Research** | Host and providers independently investigate design options with assigned lenses before consolidation and debate. |
 | **Provider-seeded Research** | One selected provider creates the first design seed/evidence artifact; host and other providers challenge it. |
-| **Decide automatically** | Use Host-led for bounded design work, Council for broad architecture exploration, and Provider-seeded only when the user named a provider to lead. |
-Use `AskUserQuestion` when available, or a plain numbered prompt as fallback. This is a cost/latency gate, not a design clarification. If a host-level no-questions directive prevents asking, choose Host-led Research (`host-seeded`) and report that broader provider investigation was skipped. If Provider-seeded Research is selected and the seed provider is not explicit, record the seed provider as pending; after provider availability is listed, ask which available provider should seed. Do not infer it.
+This is a mandatory design selection gate. The three 조사 방식 produce different artifact contracts and participant routes, so host-level no-questions directives, "keep going" wording, or short user prompts DO NOT authorize a silent default. Always present the three options through `AskUserQuestion` (or the host equivalent), each with a one-line description, and wait for the user's explicit choice before any provider fan-out. If Provider-seeded Research is selected and the seed provider is not explicit, record the seed provider as pending; after provider availability is listed, ask which available provider should seed. Do not infer it.
 Default design principles:
 - Prefer maintainable structure and code quality over easy/fast patchwork
@@ -110,22 +115,24 @@ External AI research and debate run in separate fresh sessions, even when the sa
 - **Design intake answers:** one-line identity, use scope, included/excluded/deferred scope, core flow, progress style, completion criteria, visual/technical constraints, and term-help assumptions
 - **Idea decision record:** path under `docs/ideas/` if the design came from a saved idea
 - **User constraints:** any explicit constraints provided
-- **Consensus domain:** `design`
-- **Research topology / 조사 방식:** selected in Step 2 (`host-seeded`, `council`, `provider-seeded`, or `automatic`)
+- **Workflow profile:** design profile with `workflow: "design"`, design `questionSet`, prompt pack, and `evidencePolicy`
+- **Research topology / 조사 방식:** selected in Step 2 (`host-seeded`, `council`, `provider-seeded`, or `automatic`); seed or research findings become `aggregation.items`
+- **Host-native route:** for Host-native first (`host-seeded`), run active-host `agestra-research` before external provider fan-out; route any host debate participant to `agestra-debate` with `participant_routes`; do not substitute the current host's external CLI provider for this native role
 - **Research notes:** what the selected investigation should look for (existing patterns, prior art, constraints, current-information needs)
 - **Research assignments:** optional participant/lens rows for `research_assignments`
 - **Available providers:** from `environment_check` / `provider_list`
 - **Requested providers:** explicit names captured from the user's wording (e.g. `[codex, gemini]`); otherwise "all available"
 - **Locale:** from `setup_status`
 - **Target workspace root:** absolute project folder if the user supplied or implied one; pass it to workspace/debate MCP calls as `workspace_base_dir`
+- **Progress contract:** surface concise phase updates every 30-60 seconds; poll `agent_debate_status` or `run_observable_events` with a cursor when available; if trace is `cold-start`, report the current local phase and keep monitoring
 - **Original user request:** preserve verbatim
 Team-lead owns the rest:
 - Building the participant team from focused research lenses, explicit host-turn debate participants, and external providers when applicable
-- Resolving the selected research topology, then calling `agent_research_consensus_start` when investigation fan-out is required or `agent_consensus_start` with prepared `initial_aggregation.items` when seed/host evidence is already available.
+- Resolving the selected research topology, then calling `agent_research_start` when investigation fan-out is required; call debate-only `agent_consensus_start` only after `aggregation.json` has been inspected and approved.
 - Ensuring external AI research and debate use separate fresh sessions.
-- Never creating a bundled research pseudo-participant and never carrying research bundles through `source_documents`.
-- Inspecting `aggregation_record.json`, `open_debate_items.json`, `round_packet.{round}.{provider}.json`, the aggregation document, and the leader-authored final decision document under `docs/agestra/`.
+- Never creating a bundled research pseudo-participant and never carrying research bundles through legacy source-document fields.
+- Inspecting `research_submissions.json`, `research_transcript.json`, `aggregation.json`, `debate_transcript.json`, `workflow_result.json`, the threaded aggregation document, and the concise final decision document under `docs/reports/design/`.
 - Returning the research artifact paths, accepted decisions, excluded options, disputed items, and the final design document path under `docs/plans/`.
 **Do NOT from this command:**