npm - @vpxa/aikit - Versions diffs - 0.1.83 → 0.1.85 - Mend

@vpxa/aikit 0.1.83 → 0.1.85

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/package.json +1 -1
package/packages/cli/dist/index.js +1 -1
package/packages/core/dist/index.d.ts +13 -1
package/packages/core/dist/index.js +1 -1
package/packages/server/dist/index.js +1 -1
package/packages/server/dist/{server-DyCYX0nU.js → server-CRBcgUPU.js} +138 -138
package/packages/store/dist/index.d.ts +1 -1
package/packages/store/dist/index.js +1 -1
package/packages/tools/dist/index.d.ts +8 -10
package/packages/tools/dist/index.js +70 -62
package/scaffold/dist/definitions/agents.mjs +58 -2
package/scaffold/dist/definitions/bodies.mjs +17 -2
package/scaffold/dist/definitions/flows.mjs +18 -13
package/scaffold/dist/definitions/prompts.mjs +1 -1
package/scaffold/dist/definitions/protocols.mjs +83 -2
package/scaffold/dist/definitions/skills.mjs +34 -17

package/scaffold/dist/definitions/agents.mjs CHANGED Viewed

@@ -1,9 +1,65 @@
-const e={Orchestrator:{title:`The Master Conductor`,description:`Master conductor that orchestrates the full development lifecycle: Planning → Implementation → Review → Recovery → Commit`,argumentHint:null,toolRole:`orchestrator`,sharedBase:null,sharedProtocols:[`decision-protocol`,`forge-protocol`],category:`orchestration`,skills:[]},Planner:{title:`The Strategic Architect`,description:`Autonomous planner that researches codebases and writes comprehensive TDD implementation plans`,argumentHint:null,toolRole:`planner`,sharedBase:`code-agent-base`,category:`orchestration`},Implementer:{title:`The Code Builder`,description:`Persistent implementation agent that writes code following TDD practices until all tasks are complete`,argumentHint:`Implementation task, feature, or phase from plan`,toolRole:`codeAgent`,sharedBase:`code-agent-base`,category:`implementation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When writing TypeScript code — type patterns, generics, utility types`]]},Frontend:{title:`The UI Specialist`,description:`UI/UX specialist for React, styling, responsive design, and frontend implementation`,argumentHint:`UI component, styling task, or frontend feature`,toolRole:`codeAgent`,sharedBase:`code-agent-base`,category:`implementation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`react`,`When building React components — hooks, patterns, Server Components`],[`typescript`,`When writing TypeScript code — type patterns, generics, utility types`],[`frontend-design`,`When implementing UI/UX — design systems, accessibility, responsive patterns`]]},Refactor:{title:`The Code Sculptor`,description:`Code refactoring specialist that improves structure, readability, and maintainability`,argumentHint:`Code, component, or pattern to refactor`,toolRole:`refactor`,sharedBase:`code-agent-base`,category:`implementation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`]]},Debugger:{title:`The Problem Solver`,description:`Expert debugger that diagnoses issues, traces errors, and provides solutions`,argumentHint:`Error message, stack trace, or description of issue`,toolRole:`debugger`,sharedBase:`code-agent-base`,category:`diagnostics`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When writing TypeScript code — type patterns, generics, utility types`]]},Security:{title:`The Vulnerability Hunter`,description:`Security specialist that analyzes code for vulnerabilities and compliance`,argumentHint:`Code, feature, or component to security review`,toolRole:`security`,sharedBase:`code-agent-base`,category:`diagnostics`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When reviewing code — security patterns, type safety`]]},Documenter:{title:`The Knowledge Keeper`,description:`Documentation specialist that creates and maintains comprehensive project documentation`,argumentHint:`Component, API, feature, or area to document`,toolRole:`documenter`,sharedBase:`code-agent-base`,category:`documentation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`present`,`When presenting documentation previews or architecture visuals to the user`],[`docs`,`When creating or updating project documentation — docs/ convention, architecture blueprints, Diátaxis framework`]]},Explorer:{title:`The Rapid Scout`,description:`Rapid codebase exploration to find files, usages, dependencies, and structural context`,argumentHint:`Find files, usages, and context related to: {topic or goal}`,toolRole:`explorer`,sharedBase:null,category:`exploration`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`]]},Researcher:{title:`The Context Gatherer`,description:`Deep analysis, architecture review, and multi-model decision protocol participant`,argumentHint:`Research question, problem statement, or subsystem to investigate`,toolRole:`researcher`,sharedBase:`researcher-base`,category:`research`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`lesson-learned`,`When analyzing past changes to extract engineering principles`],[`c4-architecture`,`When researching system architecture — produce C4 diagrams`],[`adr-skill`,`When the research involves a technical decision — draft an ADR`]],variants:{Alpha:{description:`Primary deep research agent — also serves as default Researcher`,identity:`, the primary deep research agent. During multi-model decision sessions, you provide deep reasoning and nuanced system design.`,bodyAddendum:`## Required Output Section — \`## Depth Analysis\`
+const e={Orchestrator:{title:`The Master Conductor`,description:`Master conductor that orchestrates the full development lifecycle: Planning → Implementation → Review → Recovery → Commit`,argumentHint:null,toolRole:`orchestrator`,sharedBase:null,sharedProtocols:[`decision-protocol`,`forge-protocol`],category:`orchestration`,skills:[]},Planner:{title:`The Strategic Architect`,description:`Autonomous planner that researches codebases and writes comprehensive TDD implementation plans`,argumentHint:null,toolRole:`planner`,sharedBase:`code-agent-base`,category:`orchestration`},Implementer:{title:`The Code Builder`,description:`Persistent implementation agent that writes code following TDD practices until all tasks are complete`,argumentHint:`Implementation task, feature, or phase from plan`,toolRole:`codeAgent`,sharedBase:`code-agent-base`,category:`implementation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When writing TypeScript code — type patterns, generics, utility types`]]},Frontend:{title:`The UI Specialist`,description:`UI/UX specialist for React, styling, responsive design, and frontend implementation`,argumentHint:`UI component, styling task, or frontend feature`,toolRole:`codeAgent`,sharedBase:`code-agent-base`,category:`implementation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`react`,`When building React components — hooks, patterns, Server Components`],[`typescript`,`When writing TypeScript code — type patterns, generics, utility types`],[`frontend-design`,`When implementing UI/UX — design systems, accessibility, responsive patterns`]]},Refactor:{title:`The Code Sculptor`,description:`Code refactoring specialist that improves structure, readability, and maintainability`,argumentHint:`Code, component, or pattern to refactor`,toolRole:`refactor`,sharedBase:`code-agent-base`,category:`implementation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`]]},Debugger:{title:`The Problem Solver`,description:`Expert debugger that diagnoses issues, traces errors, and provides solutions`,argumentHint:`Error message, stack trace, or description of issue`,toolRole:`debugger`,sharedBase:`code-agent-base`,category:`diagnostics`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When writing TypeScript code — type patterns, generics, utility types`]]},Security:{title:`The Vulnerability Hunter`,description:`Security specialist that analyzes code for vulnerabilities and compliance`,argumentHint:`Code, feature, or component to security review`,toolRole:`security`,sharedBase:`code-agent-base`,category:`diagnostics`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When reviewing code — security patterns, type safety`]]},Documenter:{title:`The Knowledge Keeper`,description:`Documentation specialist that creates and maintains comprehensive project documentation`,argumentHint:`Component, API, feature, or area to document`,toolRole:`documenter`,sharedBase:`code-agent-base`,category:`documentation`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`present`,`When presenting documentation previews or architecture visuals to the user`],[`docs`,`When creating or updating project documentation — docs/ convention, architecture blueprints, Diátaxis framework`]]},Explorer:{title:`The Rapid Scout`,description:`Rapid codebase exploration to find files, usages, dependencies, and structural context`,argumentHint:`Find files, usages, and context related to: {topic or goal}`,toolRole:`explorer`,sharedBase:null,category:`exploration`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`]]},Researcher:{title:`The Context Gatherer`,description:`Deep analysis, architecture review, and multi-model decision protocol participant`,argumentHint:`Research question, problem statement, or subsystem to investigate`,toolRole:`researcher`,sharedBase:`researcher-base`,category:`research`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`lesson-learned`,`When analyzing past changes to extract engineering principles`],[`c4-architecture`,`When researching system architecture — produce C4 diagrams`],[`adr-skill`,`When the research involves a technical decision — draft an ADR`]],variants:{Alpha:{description:`Primary deep research agent — also serves as default Researcher`,identity:`, the primary deep research agent. During multi-model decision sessions, you provide deep reasoning and nuanced system design. Your thinking style is **Contrarian** — actively look for flaws, fatal assumptions, and hidden risks in every approach. The best ideas survive adversarial pressure.`,bodyAddendum:`## Required Output Section — \`## Depth Analysis\`
 Your final report MUST contain a \`## Depth Analysis\` section with:
 - Deep-dive into ONE chosen subsystem (most structurally central to the question)
 - Full evidence chain: file:line citations for every structural claim
 - At least 2 \`compact\`/\`file_summary\` extracts woven into the narrative
+## Thinking Style: Contrarian
+During multi-model decision sessions, apply the **Contrarian** lens:
+- For every proposed approach, actively seek the fatal flaw or hidden assumption
+- Ask: "Under what conditions does this approach fail catastrophically?"
+- Prefer uncomfortable truths over comfortable consensus
 You are the DEFAULT researcher. When the Orchestrator needs breadth + depth, they
-dispatch you alone. Your lens: thorough, evidence-first, exhaustive.`},Beta:{description:`Research variant — pragmatic analysis with focus on trade-offs and edge cases`,identity:`, a variant of the Researcher agent optimized for **pragmatic analysis**. Focus on trade-offs, edge cases, and practical constraints. Challenge assumptions and highlight risks the primary researcher may overlook.`,bodyAddendum:"## Required Output Section — `## Failure Modes & Counter-Evidence`\n\nYour final report MUST contain a `## Failure Modes & Counter-Evidence` section with:\n- At least 3 adversarial claims challenging your own primary finding\n- For each counter-claim: the condition under which it would be TRUE, and the\n  evidence (file:line or search receipt) that currently falsifies it\n- Any unresolved counter-evidence flagged as `⚠ UNRESOLVED`\n\nYour lens: pragmatic skepticism. Mark competing claims as `A` (Assumed) by default;\nchallenge before promoting to `V`."},Gamma:{description:`Research variant — broad pattern matching across domains and technologies`,identity:`, a variant of the Researcher agent optimized for **cross-domain pattern matching**. Draw connections from other domains, frameworks, and industries. Bring breadth where Alpha brings depth.`,bodyAddendum:'## Required Output Section — `## Cross-Domain Analogies`\n\nYour final report MUST contain a `## Cross-Domain Analogies` section with:\n- At least 2 patterns from other tools/frameworks/domains that apply to the question\n- For each: the external source (cite via `web_search` or `web_fetch` receipt) and\n  how it maps to our codebase\n- One "missing pattern we should adopt" recommendation\n\nYour lens: cross-domain pattern matching. Weight `web_search` + `web_fetch` higher\nthan peers. Assume the LLM\'s training data is stale — verify with fresh searches.'},Delta:{description:`Research variant — implementation feasibility and performance implications`,identity:`, a variant of the Researcher agent optimized for **implementation feasibility**. Focus on performance implications, scaling concerns, and concrete implementation paths. Ground theoretical proposals in practical reality.`,bodyAddendum:"## Required Output Section — `## Implementation Cost & Feasibility`\n\nYour final report MUST contain a `## Implementation Cost & Feasibility` section with:\n- Complexity snapshot: you MUST call `measure({ path })` on any file ≥ 50 LOC in the\n  target subsystem at least once and quote the `cognitiveComplexity` result\n- Blast radius estimate: `blast_radius({ changed_files })` on the proposed edits\n- Time/risk table: | Change | Lines | Risk | Effort |\n- Feasibility verdict: SAFE / RISKY / INFEASIBLE with one-line justification\n\nYour lens: implementation feasibility. Prefer `measure` + `blast_radius` + `analyze_patterns`\nover abstract reasoning."}}},"Code-Reviewer":{title:`The Quality Guardian`,description:`Code review specialist analyzing code for quality, security, performance, and maintainability`,argumentHint:`File path, PR, or code to review`,toolRole:`reviewer`,sharedBase:`code-reviewer-base`,category:`review`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When reviewing TypeScript code — type patterns, best practices`]],variants:{Alpha:{description:`Primary code reviewer`},Beta:{description:`Code reviewer variant — different LLM perspective for dual review`}}},"Architect-Reviewer":{title:`The Structural Guardian`,description:`Reviews architecture for pattern adherence, SOLID compliance, dependency direction, and structural integrity`,argumentHint:`Files, PR, or subsystem to architecture-review`,toolRole:`reviewer`,sharedBase:`architect-reviewer-base`,category:`review`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`c4-architecture`,`When reviewing architectural diagrams or boundary changes`],[`adr-skill`,`When the review involves architecture decisions — reference or create ADRs`]],extraBody:`You are **not** the Code-Reviewer agent. Code-Reviewer handles correctness, testing, security, and code quality. You handle the big picture: service boundaries, dependency direction, pattern adherence, and structural health.`,variants:{Alpha:{description:`Primary architecture reviewer`},Beta:{description:`Architecture reviewer variant — different LLM perspective for dual review`}}}};export{e as AGENTS};
+dispatch you alone. Your lens: thorough, evidence-first, exhaustive + contrarian.`},Beta:{description:`Research variant — pragmatic analysis with focus on trade-offs and edge cases`,identity:`, a variant of the Researcher agent optimized for **pragmatic analysis**. Focus on trade-offs, edge cases, and practical constraints. Challenge assumptions and highlight risks the primary researcher may overlook. Your thinking style is **First Principles** — strip away assumptions, decompose to ground truths, and rebuild reasoning from scratch.`,bodyAddendum:`## Required Output Section — \`## Failure Modes & Counter-Evidence\`
+Your final report MUST contain a \`## Failure Modes & Counter-Evidence\` section with:
+- At least 3 adversarial claims challenging your own primary finding
+- For each counter-claim: the condition under which it would be TRUE, and the
+  evidence (file:line or search receipt) that currently falsifies it
+- Any unresolved counter-evidence flagged as \`⚠ UNRESOLVED\`
+## Thinking Style: First Principles
+During multi-model decision sessions, apply the **First Principles** lens:
+- Strip every assumption: "Is this truly required, or just inherited convention?"
+- Decompose to ground truths, then rebuild the reasoning from scratch
+- If the current approach exists only because "that's how it's always been done", flag it
+Your lens: pragmatic skepticism + first principles. Mark competing claims as \`A\` (Assumed)
+by default; challenge before promoting to \`V\`.`},Gamma:{description:`Research variant — broad pattern matching across domains and technologies`,identity:`, a variant of the Researcher agent optimized for **cross-domain pattern matching**. Draw connections from other domains, frameworks, and industries. Bring breadth where Alpha brings depth. Your thinking style is **Expansionist** — look for the bigger opportunity, find what's undervalued, and identify patterns others dismiss.`,bodyAddendum:`## Required Output Section — \`## Cross-Domain Analogies\`
+Your final report MUST contain a \`## Cross-Domain Analogies\` section with:
+- At least 2 patterns from other tools/frameworks/domains that apply to the question
+- For each: the external source (cite via \`web_search\` or \`web_fetch\` receipt) and
+  how it maps to our codebase
+- One "missing pattern we should adopt" recommendation
+## Thinking Style: Expansionist
+During multi-model decision sessions, apply the **Expansionist** lens:
+- Ask: "What's the bigger opportunity everyone else is ignoring?"
+- Seek undervalued approaches and non-obvious connections across domains
+- Challenge narrow framing: "Is this really just an X problem, or is it also a Y problem?"
+Your lens: cross-domain pattern matching + expansionist. Weight \`web_search\` + \`web_fetch\`
+higher than peers. Assume the LLM's training data is stale — verify with fresh searches.`},Delta:{description:`Research variant — implementation feasibility and performance implications`,identity:`, a variant of the Researcher agent optimized for **implementation feasibility**. Focus on performance implications, scaling concerns, and concrete implementation paths. Ground theoretical proposals in practical reality. Your thinking style is **Executor** — focus on what can actually be built, the fastest path to value, and real-world constraints.`,bodyAddendum:`## Required Output Section — \`## Implementation Cost & Feasibility\`
+Your final report MUST contain a \`## Implementation Cost & Feasibility\` section with:
+- Complexity snapshot: you MUST call \`measure({ path })\` on any file ≥ 50 LOC in the
+  target subsystem at least once and quote the \`cognitiveComplexity\` result
+- Blast radius estimate: \`blast_radius({ changed_files })\` on the proposed edits
+- Time/risk table: | Change | Lines | Risk | Effort |
+- Feasibility verdict: SAFE / RISKY / INFEASIBLE with one-line justification
+## Thinking Style: Executor
+During multi-model decision sessions, apply the **Executor** lens:
+- Ask: "Can this actually be built? What's the fastest path to a working version?"
+- Ground every proposal in concrete effort: lines of code, files changed, risk
+- Reject elegant theory that can't survive contact with the codebase
+Your lens: implementation feasibility + executor. Prefer \`measure\` + \`blast_radius\` +
+\`analyze_patterns\` over abstract reasoning.`}}},"Code-Reviewer":{title:`The Quality Guardian`,description:`Code review specialist analyzing code for quality, security, performance, and maintainability`,argumentHint:`File path, PR, or code to review`,toolRole:`reviewer`,sharedBase:`code-reviewer-base`,category:`review`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`typescript`,`When reviewing TypeScript code — type patterns, best practices`]],variants:{Alpha:{description:`Primary code reviewer`},Beta:{description:`Code reviewer variant — different LLM perspective for dual review`}}},"Architect-Reviewer":{title:`The Structural Guardian`,description:`Reviews architecture for pattern adherence, SOLID compliance, dependency direction, and structural integrity`,argumentHint:`Files, PR, or subsystem to architecture-review`,toolRole:`reviewer`,sharedBase:`architect-reviewer-base`,category:`review`,skills:[[`aikit`,`**Always** — AI Kit tool signatures, search, analysis`],[`c4-architecture`,`When reviewing architectural diagrams or boundary changes`],[`adr-skill`,`When the review involves architecture decisions — reference or create ADRs`]],extraBody:`You are **not** the Code-Reviewer agent. Code-Reviewer handles correctness, testing, security, and code quality. You handle the big picture: service boundaries, dependency direction, pattern adherence, and structural health.`,variants:{Alpha:{description:`Primary architecture reviewer`},Beta:{description:`Architecture reviewer variant — different LLM perspective for dual review`}}}};export{e as AGENTS};

package/scaffold/dist/definitions/bodies.mjs CHANGED Viewed

@@ -67,6 +67,21 @@ For EACH step in the active flow:
 **Custom flows work identically** — \`flow_list\` returns them alongside builtins. The execution loop is the same for ALL flows.
+### Design & Decision Detection (applies to ALL flows including custom)
+When executing ANY flow step (builtin or custom), detect if the step involves design or decision work:
+**Detection signals** (in step name, description, or instruction content):
+- Keywords: design, brainstorm, architecture, decision, approach, strategy, RFC, ADR, trade-off, alternatives, options
+- Step asks to "choose between", "evaluate options", "propose approaches", or "make a decision"
+**When detected, ALWAYS:**
+1. Load the \`brainstorming\` skill — use it for requirements discovery and creative exploration
+2. Apply the **Multi-Model Decision Protocol** (inlined below under "Multi-Model Decision Protocol") for any non-trivial technical decisions
+3. This applies equally to builtin flows, custom flows, and any future flow — no exceptions
+Custom flows are NOT expected to reference these protocols in their step instructions. The Orchestrator injects them automatically based on step content detection.
 ### Flow Completion & Cleanup
 Flows MUST be driven to completion. A flow left active forever blocks future work.
@@ -114,7 +129,7 @@ Batch 2 (after batch 1):
 **Subagent prompt template:**
 1. **Scope** — exact files + boundary
 2. **Goal** — acceptance criteria, testable
-3. **Arch Context** — code snippets from \`compact()\`/\`digest()\`
+3. **Arch Context** — varies by \`config.tokenBudget\`: efficient → \`stratum_card({tier:'T1'})\`, normal → \`compact({path, query})\`, full → \`digest({sources})\`. Default to efficient unless task complexity requires more.
 4. **Constraints** — patterns, conventions
 5. **Artifacts Path** — the active flow's run directory and artifacts path from \`flow_status\` (e.g. \`.flows/add-authentication/.spec/\`)
 6. **FORGE** — tier + task_id + evidence requirements (reviewers add CRITICAL/HIGH claims into your task_id; never create their own)
@@ -287,7 +302,7 @@ Before every tool call, verify:
 |-------|--------------|
 | \`multi-agents-development\` | **Before any delegation** — task decomposition, dispatch templates, review pipeline, recovery patterns |
 | \`present\` | When presenting plans, findings, or visual content to the user — dashboards, tables, charts, timelines |
-| \`brainstorming\` | When a flow's design step requires creative/design work |
+| \`brainstorming\` | When ANY flow step (builtin or custom) involves design, brainstorming, or creative work — auto-detected by Orchestrator. Pairs with the Multi-Model Decision Protocol for technical decisions |
 | \`session-handoff\` | Context filling up, session ending, or major milestone |
 | \`lesson-learned\` | After completing work — extract engineering principles |
 | \`docs\` | During \`_docs-sync\` epilogue — living documentation convention, templates, change-to-doc mapping |

package/scaffold/dist/definitions/flows.mjs CHANGED Viewed

@@ -117,7 +117,7 @@ If the flow's changes don't warrant doc updates (e.g., pure bug fix with no reve
 - [ ] \`docs/\` bootstrapped with tool outputs if it didn't exist
 - [ ] Relevant docs created or updated (or skipped with reason)
 - [ ] \`docs/README.md\` index is current
-- [ ] No placeholder/empty docs created — all content tool-generated or hand-written with purpose`}],"aikit-advanced":[{file:`README.md`,content:"# aikit:advanced — Full Development Flow\n\nFull development flow for **new features, API design, and architecture changes**.\n\n## Steps\n\n| # | Step | Skill | Produces | Requires | Agents |\n|---|------|-------|----------|----------|--------|\n| 1 | **Design Gate** | `steps/design/README.md` | `design-decisions.md` | — | Researcher-Alpha/Beta/Gamma/Delta |\n| 2 | **Specification** | `steps/spec/README.md` | `spec.md` | `design-decisions.md` | Researcher-Alpha |\n| 3 | **Planning** | `steps/plan/README.md` | `plan.md` | `spec.md` | Planner, Explorer |\n| 4 | **Task Breakdown** | `steps/task/README.md` | `tasks.md` | `plan.md` | Planner, Architect-Reviewer-Alpha |\n| 5 | **Execution** | `steps/execute/README.md` | `progress.md` | `tasks.md` | Orchestrator, Implementer, Frontend, Refactor |\n| 6 | **Verification** | `steps/verify/README.md` | `verify-report.md` | `progress.md` | Code-Reviewer-Alpha/Beta, Architect-Reviewer-Alpha/Beta, Security |\n\n## How It Works\n\nEach step has a **README.md** file that contains the detailed instructions for the agent(s) executing that step. The Orchestrator reads the README.md via `flow_read_instruction` and delegates work accordingly.\n\n### Step 1: Design Gate\n- Full brainstorming session for new features and architectural changes\n- FORGE classification (`forge_classify`) + grounding (`forge_ground`) for complex tasks\n- Parallel 4-researcher decision protocol for non-trivial technical decisions\n- ADR generation for critical-tier tasks\n- **Mandatory user stop** before proceeding — design decisions must be approved\n- Read `steps/design/README.md` for the full protocol\n\n### Step 2: Specification\n- Elicit requirements from the user, clarify scope\n- Define acceptance criteria and constraints\n- Build on design decisions from the previous step\n\n### Step 3: Planning\n- Deep codebase analysis using `search`, `scope_map`, `trace`, `analyze_*`\n- Design architecture based on spec and design decisions\n- Create comprehensive implementation plan with file-level changes\n\n### Step 4: Task Breakdown\n- Break the plan into ordered, atomic implementation tasks\n- Define dependencies between tasks\n- Identify parallel batches for multi-agent execution\n- Architecture review of the task structure\n\n### Step 5: Execution\n- Orchestrator dispatches agents in parallel batches per the task breakdown\n- Each agent gets a scoped task (1-3 files) with clear acceptance criteria\n- TDD: write tests first, then implement\n- Per-batch review cycle: Code Review (dual) → Arch Review → Security → Evidence Gate\n\n### Step 6: Verification\n- Dual code review (Code-Reviewer-Alpha + Beta)\n- Architecture review (Architect-Reviewer-Alpha + Beta)\n- Security review\n- Run `check({})` + `test_run({})` + `blast_radius({})`\n- `evidence_map({ action: \"gate\" })` for final quality gate\n\n## Using Skills Inside Steps\n\nWhen the Orchestrator activates a step:\n\n1. **Read the instruction first** — `flow_read_instruction` returns the README.md for the current step\n2. **Follow step instructions** — the README.md is the primary guide for what to do\n3. **Delegate to listed agents** — each step lists which agents are appropriate\n4. **Produce the required artifact** — the step's `produces` field specifies what file to create in the artifacts directory\n5. **Check dependencies** — the step's `requires` field lists artifacts from previous steps that must exist\n6. **Report status** — agents report `DONE` | `DONE_WITH_CONCERNS` | `NEEDS_CONTEXT` | `BLOCKED` to the Orchestrator\n\n## Artifacts\n\nAll artifacts are stored in the run directory under `.flows/{topic}/`. The template variable `{{artifacts_path}}` resolves to the actual path at runtime.\n"},{file:`steps/design/README.md`,content:`# Design Gate — Advanced Flow
+- [ ] No placeholder/empty docs created — all content tool-generated or hand-written with purpose`}],"aikit-advanced":[{file:`README.md`,content:"# aikit:advanced — Full Development Flow\n\nFull development flow for **new features, API design, and architecture changes**.\n\n## Steps\n\n| # | Step | Skill | Produces | Requires | Agents |\n|---|------|-------|----------|----------|--------|\n| 1 | **Design Gate** | `steps/design/README.md` | `design-decisions.md` | — | Researcher-Alpha/Beta/Gamma/Delta |\n| 2 | **Specification** | `steps/spec/README.md` | `spec.md` | `design-decisions.md` | Researcher-Alpha |\n| 3 | **Planning** | `steps/plan/README.md` | `plan.md` | `spec.md` | Planner, Explorer |\n| 4 | **Task Breakdown** | `steps/task/README.md` | `tasks.md` | `plan.md` | Planner, Architect-Reviewer-Alpha |\n| 5 | **Execution** | `steps/execute/README.md` | `progress.md` | `tasks.md` | Orchestrator, Implementer, Frontend, Refactor |\n| 6 | **Verification** | `steps/verify/README.md` | `verify-report.md` | `progress.md` | Code-Reviewer-Alpha/Beta, Architect-Reviewer-Alpha/Beta, Security |\n\n## How It Works\n\nEach step has a **README.md** file that contains the detailed instructions for the agent(s) executing that step. The Orchestrator reads the README.md via `flow_read_instruction` and delegates work accordingly.\n\n### Step 1: Design Gate\n- Full brainstorming session for new features and architectural changes\n- FORGE classification (`forge_classify`) + grounding (`forge_ground`) for complex tasks\n- Full 3-phase multi-model decision protocol for non-trivial technical decisions (see Orchestrator's inlined Multi-Model Decision Protocol)\n- ADR generation for critical-tier tasks\n- **Mandatory user stop** before proceeding — design decisions must be approved\n- Read `steps/design/README.md` for the full protocol\n\n### Step 2: Specification\n- Elicit requirements from the user, clarify scope\n- Define acceptance criteria and constraints\n- Build on design decisions from the previous step\n\n### Step 3: Planning\n- Deep codebase analysis using `search`, `scope_map`, `trace`, `analyze_*`\n- Design architecture based on spec and design decisions\n- Create comprehensive implementation plan with file-level changes\n\n### Step 4: Task Breakdown\n- Break the plan into ordered, atomic implementation tasks\n- Define dependencies between tasks\n- Identify parallel batches for multi-agent execution\n- Architecture review of the task structure\n\n### Step 5: Execution\n- Orchestrator dispatches agents in parallel batches per the task breakdown\n- Each agent gets a scoped task (1-3 files) with clear acceptance criteria\n- TDD: write tests first, then implement\n- Per-batch review cycle: Code Review (dual) → Arch Review → Security → Evidence Gate\n\n### Step 6: Verification\n- Dual code review (Code-Reviewer-Alpha + Beta)\n- Architecture review (Architect-Reviewer-Alpha + Beta)\n- Security review\n- Run `check({})` + `test_run({})` + `blast_radius({})`\n- `evidence_map({ action: \"gate\" })` for final quality gate\n\n## Using Skills Inside Steps\n\nWhen the Orchestrator activates a step:\n\n1. **Read the instruction first** — `flow_read_instruction` returns the README.md for the current step\n2. **Follow step instructions** — the README.md is the primary guide for what to do\n3. **Delegate to listed agents** — each step lists which agents are appropriate\n4. **Produce the required artifact** — the step's `produces` field specifies what file to create in the artifacts directory\n5. **Check dependencies** — the step's `requires` field lists artifacts from previous steps that must exist\n6. **Report status** — agents report `DONE` | `DONE_WITH_CONCERNS` | `NEEDS_CONTEXT` | `BLOCKED` to the Orchestrator\n\n## Artifacts\n\nAll artifacts are stored in the run directory under `.flows/{topic}/`. The template variable `{{artifacts_path}}` resolves to the actual path at runtime.\n"},{file:`steps/design/README.md`,content:`# Design Gate — Advanced Flow
 Full design gate for new features, API design, and architecture changes. Runs brainstorming, decision protocol, and FORGE classification before specification begins.
@@ -164,16 +164,19 @@ For **Critical** tier tasks, also explore:
 ### 4. Decision Protocol (Standard & Critical tiers)
-When technical decisions need resolution:
+When technical decisions need resolution, follow the **3-phase multi-model decision protocol**:
 1. **Identify decisions** — List each decision point with 2+ viable options
-2. **Parallel research** — Delegate to Researcher agents (2 for Standard, 4 for Critical):
-   - Researcher-Alpha: Deep analysis of primary approach
-   - Researcher-Beta: Trade-offs and edge cases of alternatives
-   - Researcher-Gamma: Cross-domain patterns and precedents
-   - Researcher-Delta: Feasibility and performance implications
-3. **Synthesize** — Combine researcher findings into a recommendation per decision
-4. **ADR** (Critical tier) — Load \`adr-skill\` and create an Architecture Decision Record
+2. **Phase 1 — Independent Research** — Launch ALL 4 Researcher variants in parallel:
+   - Researcher-Alpha (Contrarian): Deep analysis, actively seeks fatal flaws
+   - Researcher-Beta (First Principles): Trade-offs and edge cases, strips assumptions
+   - Researcher-Gamma (Expansionist): Cross-domain patterns, undervalued opportunities
+   - Researcher-Delta (Executor): Feasibility, performance, fastest implementation path
+3. **Phase 2 — Peer Review** — Anonymize outputs as Perspective A/B/C/D, launch 4 reviewers in parallel asking: strongest argument, biggest blind spot, consensus gap, verdict
+4. **Phase 3 — Structured Verdict** — Synthesize into: Where Agrees / Where Clashes / Blind Spots Caught / Recommendation (with confidence) / First Step
+5. **Present & Record** — Render verdict with \`present\`, produce ADR via \`adr-skill\`
+**Floor tier shortcut**: Skip Phase 2 (peer review), go straight from research to verdict.
 ### 5. FORGE Ground (Standard & Critical tiers)
@@ -951,7 +954,7 @@ Before completing this step, persist important findings using \`remember()\`:
 - **Session checkpoint**: Summarize what was accomplished, decisions made, and any remaining work
 **Every step produces knowledge worth preserving.** If you discovered something that would help a future session, call \`remember()\` now.
-`}],"aikit-basic":[{file:`README.md`,content:"# aikit:basic — Quick Development Flow\n\nQuick development flow for **bug fixes, small features, and refactoring**.\n\n## Steps\n\n| # | Step | Skill | Produces | Requires | Agents |\n|---|------|-------|----------|----------|--------|\n| 1 | **Design Gate** | `steps/design/README.md` | `design-decisions.md` | — | Researcher-Alpha/Beta/Gamma/Delta |\n| 2 | **Assessment** | `steps/assess/README.md` | `assessment.md` | `design-decisions.md` | Explorer, Researcher-Alpha |\n| 3 | **Implementation** | `steps/implement/README.md` | `progress.md` | `assessment.md` | Implementer, Frontend |\n| 4 | **Verification** | `steps/verify/README.md` | `verify-report.md` | `progress.md` | Code-Reviewer-Alpha, Security |\n\n## How It Works\n\nEach step has a **README.md** file that contains the detailed instructions for the agent(s) executing that step. The Orchestrator reads the README.md via `flow_read_instruction` and delegates work accordingly.\n\n### Step 1: Design Gate\n- **Auto-skips** for bug fixes and refactors (produces a minimal `design-decisions.md` noting it was skipped)\n- For small features: runs quick brainstorming, FORGE classification, and optional decision protocol\n- Read `steps/design/README.md` for the full decision tree\n\n### Step 2: Assessment\n- Explore the codebase to understand scope and impact\n- Use `search`, `scope_map`, `file_summary`, `compact` to gather context\n- Identify the approach and produce `assessment.md`\n\n### Step 3: Implementation\n- Write code following the assessment plan\n- The Orchestrator dispatches Implementer/Frontend agents with specific file scopes\n- Follow TDD practices where applicable\n\n### Step 4: Verification\n- Code review, test execution, security check\n- Run `check({})` + `test_run({})` + `blast_radius({})`\n- Produce `verify-report.md` with findings\n\n## Using Skills Inside Steps\n\nWhen the Orchestrator activates a step:\n\n1. **Read the instruction first** — `flow_read_instruction` returns the README.md for the current step\n2. **Follow step instructions** — the README.md is the primary guide for what to do\n3. **Delegate to listed agents** — each step lists which agents are appropriate\n4. **Produce the required artifact** — the step's `produces` field specifies what file to create in the artifacts directory\n5. **Check dependencies** — the step's `requires` field lists artifacts from previous steps that must exist\n6. **Report status** — agents report `DONE` | `DONE_WITH_CONCERNS` | `NEEDS_CONTEXT` | `BLOCKED` to the Orchestrator\n\n## Artifacts\n\nAll artifacts are stored in the run directory under `.flows/{topic}/`. The template variable `{{artifacts_path}}` resolves to the actual path at runtime.\n"},{file:`steps/assess/README.md`,content:`---
+`}],"aikit-basic":[{file:`README.md`,content:"# aikit:basic — Quick Development Flow\n\nQuick development flow for **bug fixes, small features, and refactoring**.\n\n## Steps\n\n| # | Step | Skill | Produces | Requires | Agents |\n|---|------|-------|----------|----------|--------|\n| 1 | **Design Gate** | `steps/design/README.md` | `design-decisions.md` | — | Researcher-Alpha/Beta/Gamma/Delta |\n| 2 | **Assessment** | `steps/assess/README.md` | `assessment.md` | `design-decisions.md` | Explorer, Researcher-Alpha |\n| 3 | **Implementation** | `steps/implement/README.md` | `progress.md` | `assessment.md` | Implementer, Frontend |\n| 4 | **Verification** | `steps/verify/README.md` | `verify-report.md` | `progress.md` | Code-Reviewer-Alpha, Security |\n\n## How It Works\n\nEach step has a **README.md** file that contains the detailed instructions for the agent(s) executing that step. The Orchestrator reads the README.md via `flow_read_instruction` and delegates work accordingly.\n\n### Step 1: Design Gate\n- **Auto-skips** for bug fixes and refactors (produces a minimal `design-decisions.md` noting it was skipped)\n- For small features: runs quick brainstorming, FORGE classification, and optional decision protocol (see Orchestrator's inlined Multi-Model Decision Protocol for the full 3-phase process)\n- Read `steps/design/README.md` for the full decision tree\n\n### Step 2: Assessment\n- Explore the codebase to understand scope and impact\n- Use `search`, `scope_map`, `file_summary`, `compact` to gather context\n- Identify the approach and produce `assessment.md`\n\n### Step 3: Implementation\n- Write code following the assessment plan\n- The Orchestrator dispatches Implementer/Frontend agents with specific file scopes\n- Follow TDD practices where applicable\n\n### Step 4: Verification\n- Code review, test execution, security check\n- Run `check({})` + `test_run({})` + `blast_radius({})`\n- Produce `verify-report.md` with findings\n\n## Using Skills Inside Steps\n\nWhen the Orchestrator activates a step:\n\n1. **Read the instruction first** — `flow_read_instruction` returns the README.md for the current step\n2. **Follow step instructions** — the README.md is the primary guide for what to do\n3. **Delegate to listed agents** — each step lists which agents are appropriate\n4. **Produce the required artifact** — the step's `produces` field specifies what file to create in the artifacts directory\n5. **Check dependencies** — the step's `requires` field lists artifacts from previous steps that must exist\n6. **Report status** — agents report `DONE` | `DONE_WITH_CONCERNS` | `NEEDS_CONTEXT` | `BLOCKED` to the Orchestrator\n\n## Artifacts\n\nAll artifacts are stored in the run directory under `.flows/{topic}/`. The template variable `{{artifacts_path}}` resolves to the actual path at runtime.\n"},{file:`steps/assess/README.md`,content:`---
 name: assess
 description: Understand scope, analyze the codebase, and identify the implementation approach.
 ---
@@ -1099,9 +1102,11 @@ For small features that need minimal design:
    - What is the user trying to achieve?
    - What are the constraints?
    - What is the simplest approach?
-3. **Decision Protocol** (if technical decisions exist) — Delegate to 2-4 Researcher agents in parallel:
-   - Each researcher evaluates a different approach
-   - Synthesize findings into a recommendation
+3. **Decision Protocol** (if technical decisions exist) — Follow the full 3-phase multi-model decision protocol:
+   - **Phase 1**: Launch ALL 4 Researcher variants in parallel (Alpha/Beta/Gamma/Delta)
+   - **Phase 2**: Anonymize outputs as A/B/C/D, run peer review round (4 reviewers in parallel)
+   - **Phase 3**: Synthesize into structured verdict (Agrees / Clashes / Blind Spots / Recommendation / First Step)
+   - Present verdict visually using \`present\`, produce ADR for Standard+ tiers
 4. **Write \`{{artifacts_path}}/design-decisions.md\`** to disk:
 \`\`\`markdown

package/scaffold/dist/definitions/prompts.mjs CHANGED Viewed

@@ -31,7 +31,7 @@ Enter Phase 0 (Design Gate) directly — the user is requesting a design session
 1. **Invoke the brainstorming skill** — interactive design dialogue with user
 2. Follow the skill's full process (auto-selects Simple or Advanced mode)
-3. If Advanced Mode, use Decision Protocol for unresolved technical choices
+3. If Advanced Mode, use the full Multi-Model Decision Protocol (3-phase: research → peer review → verdict, defined in Orchestrator instructions) for unresolved technical choices
 4. Terminal state: brainstorming skill invokes writing-plans skill
 **🛑 HARD GATE** — Do NOT skip brainstorming. Do NOT write code. Design first.`},review:{description:`Dual-model code + architecture review pipeline`,agent:`Orchestrator`,tools:[`search`,`blast_radius`,`check`,`test_run`,`analyze_dependencies`,`remember`,`present`],content:`## Review Pipeline

package/scaffold/dist/definitions/protocols.mjs CHANGED Viewed

@@ -669,9 +669,80 @@ or repeated \`neighbors\` calls.
 The Orchestrator uses **multi-model decision analysis** to resolve non-trivial technical choices. This is the autonomous decision-making process — distinct from the interactive brainstorming skill.
-## How It Works
+## How It Works (3 Phases)
-The Orchestrator launches ALL available Researcher variants **in parallel** with the same question. Each returns an independent recommendation. The Orchestrator synthesizes results and presents the agreement/disagreement breakdown to the user.
+### Phase 1 — Independent Research (parallel)
+Launch ALL available Researcher variants **in parallel** with the same question. Each returns an independent recommendation grounded in their thinking style:
+| Variant | Thinking Style | Lens |
+|---------|---------------|------|
+| **Alpha** | Contrarian | Actively seeks flaws, fatal assumptions, hidden risks |
+| **Beta** | First Principles | Strips assumptions, rebuilds reasoning from ground truth |
+| **Gamma** | Expansionist | Finds undervalued opportunities, cross-domain patterns |
+| **Delta** | Executor | Focuses on fastest path, implementation cost, feasibility |
+### Phase 2 — Peer Review (parallel)
+After all researchers return, **anonymize** their responses as Perspective A / B / C / D (strip agent names). Then launch a **second parallel batch** of 4 review sub-agents:
+**Peer Review Prompt Template:**
+\`\`\`
+You are reviewing 4 independent analyses of the same technical decision.
+Each perspective was produced independently — they have NOT seen each other's work.
+[Perspective A]
+{Alpha's full response}
+[Perspective B]
+{Beta's full response}
+[Perspective C]
+{Gamma's full response}
+[Perspective D]
+{Delta's full response}
+Evaluate ALL perspectives. Your review MUST include:
+1. **Strongest argument** — which perspective and why (cite specific evidence)
+2. **Critical blind spot** — what did the STRONGEST perspective miss?
+3. **Consensus gap** — one thing ALL perspectives overlooked or assumed
+4. **Your verdict** — which approach to adopt (may combine elements)
+\`\`\`
+Use the same 4 Researcher variants for peer review — each model reviews from its own thinking style, catching different blind spots.
+### Phase 3 — Synthesis & Verdict
+The Orchestrator synthesizes BOTH layers (original research + peer reviews) into a structured verdict.
+**Verdict Format (MANDATORY):**
+\`\`\`markdown
+## Decision Verdict: {title}
+### Where They Agree
+{Points of consensus across researchers — high confidence items}
+### Where They Clash
+{Key disagreements with the strongest argument for each side}
+### Blind Spots Caught (by peer review)
+{Issues found in Phase 2 that no researcher identified in Phase 1}
+### Recommendation
+{The chosen approach — may combine elements from multiple perspectives}
+**Confidence:** HIGH / MEDIUM / LOW
+**Rationale:** {one paragraph}
+### First Step
+{The single most concrete next action to begin implementation}
+\`\`\`
+Then:
+1. **Present** the verdict using \`present({ format: "html" })\` with comparison blocks
+2. **Produce an ADR** via the \`adr-skill\`
+3. **\`remember\`** the decision for future recall
 ## When to Use (Auto-Trigger Rules)
@@ -688,9 +759,19 @@ Trigger the decision protocol when there is an **unresolved non-trivial technica
 - Always launch in **parallel**, minimum 4 variants
 - Use exact case-sensitive agent names — never rename or alias
+- **Anonymize** researcher outputs before peer review (A/B/C/D, not agent names)
+- Peer review is a SEPARATE parallel batch — never skip it
 - Never make a non-trivial technical decision without multi-model analysis
+- Always present the verdict visually using \`present\`
 - **Produce an ADR** after every decision resolution
 - \`remember\` the decision for future recall
+## Shortcut: Floor-Tier Decisions
+For decisions classified as **Floor tier** (blast_radius ≤ 2, single concern):
+- Skip Phase 2 (peer review) — synthesis directly from Phase 1
+- Verdict format still required but can be abbreviated
+- ADR is optional (use \`remember\` at minimum)
 `,"forge-protocol":`# FORGE Protocol — Quality Overlay
 > Follow the FORGE (Fact-Oriented Reasoning with Graduated Evidence) protocol for all code generation and modification tasks.