agentic-orchestrator 0.1.19 → 0.1.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,563 +1,406 @@
1
- # Feature Spec: Dashboard Advanced UX Second Pass (AOP)
1
+ # Feature Spec: Dashboard Advanced UX - Second Pass (AOP)
2
2
 
3
- > **Purpose of this document**: Define a second wave of dashboard improvements that go beyond structural cleanup (M44) into genuinely advanced capability: surfacing data the backend already produces but never exposes, adding intelligent visualizations, and enabling operations that today require dropping to the CLI. These improvements are designed around the question: _what would the best possible version of this dashboard look like?_
3
+ > Purpose: define the M45 dashboard wave with low-cognitive-load UX, strict implementation contracts, and deterministic behavior for agentic execution.
4
4
 
5
- **Version:** 1.0
5
+ **Version:** 2.0 (rewritten for execution quality)
6
6
  **Date:** 2026-03-05
7
7
  **Status:** Draft
8
8
  **Roadmap Mapping:** M45
9
- **Depends On:** M44 (Dashboard UX Improvements, component architecture refactor must land first)
9
+ **Depends On:** M44 (UX-01..UX-20 complete)
10
10
 
11
11
  ---
12
12
 
13
13
  ## 0. Scope and Standards
14
14
 
15
- ### 0.1 Motivation
15
+ ### 0.1 Primary Users and Jobs
16
16
 
17
- The M44 spec (UX-01 through UX-20) established the structural foundation: component extraction, CSS tokens, rich cards, phase-aware panels, and basic filters. This spec adds the layer above: _insight and intelligence_.
17
+ This spec optimizes for two primary jobs:
18
18
 
19
- The backend currently produces:
19
+ 1. **Reviewer job:** make correct approve/deny/request-changes decisions quickly.
20
+ 2. **Operator job:** detect stalled runs, lock contention, and systemic issues before throughput drops.
20
21
 
21
- - **Per-feature cost data** (`cost.get` tool) — tokens used, estimated USD cost
22
- - **Provider/model performance analytics** (`performance.get_analytics`) — success rates, retry counts, durations, costs by provider/model
23
- - **Agent role-level execution status** (`role_status` in state: planner/builder/qa each with ready/running/blocked/done)
24
- - **Agent session cluster IDs** (`cluster` in state)
25
- - **Full lock lease details** (`lock_leases` in index.json — includes holder, expires_at)
26
- - **Dependency-blocked features** (`dep_blocked` in index.json — with depends_on_unresolved)
27
- - **Plan acceptance criteria** (`plan.acceptance_criteria` — required conditions for completion)
28
- - **Plan risk notes** (`plan.risk` — known risks declared by the planner)
29
- - **Plan file scope** (`plan.files.create/modify/delete` — planned file operations)
30
- - **Plan revision history** (`plan.revision_of`, `plan.revision_reason`)
31
- - **QA test coverage index** (`qa_test_index.json` — per-file test status and coverage)
32
- - **Gate retry count and last retry time** (`gate_retry_count`, `last_retry_at`)
33
- - **Orchestrator run lease** (`runtime_sessions` in index.json — provider, model, heartbeat, expiry)
34
- - **Auto-generated review briefs** (`review_brief.json` from M34-36 PRQ5)
35
- - **Feasibility scores** (`planning_quality.feasibility_score` from M34-36 PRQ2)
36
- - **Collision matrix** (from `collisions.scan` tool)
37
- - **Flaky test quarantine data** (from `gate.flaky_report_get` tool, M34-36 PRQ4)
22
+ Secondary job:
38
23
 
39
- None of this data is currently accessible from the dashboard. This spec surfaces it all.
24
+ 3. **Author job (in-scope for M45):** create a new feature run from dashboard with server-side validation and policy checks.
40
25
 
41
- ### 0.2 Required Standards
26
+ ### 0.2 Critical UX Corrections from v1.0
42
27
 
43
- Same as M44: TypeScript strict, zero lint warnings, test coverage 90% on all metrics, existing API contracts preserved, Next.js build passes.
28
+ The previous draft over-indexed on "more widgets" and under-specified decision flow. This rewrite makes these corrections:
44
29
 
45
- ### 0.3 Implementation Approach
30
+ | v1.0 Issue | Why It Is Problematic | v2.0 Correction |
31
+ | ------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------- |
32
+ | Dense feature packing in one screen | High cognitive load, weak scan hierarchy | Progressive disclosure + clear view boundaries (Board vs Analytics vs Focus route) |
33
+ | Heuristic language like "Likely Met" | Creates false confidence | Rename to **Verification Signals** with explicit confidence labels and caveats |
34
+ | Color-first status encoding | Accessibility failures for color-vision users | Every status includes icon + text + semantic label |
35
+ | Right-click-only action affordances | Not keyboard/touch accessible | Explicit action menu button + full keyboard path |
36
+ | Unbounded client polling/render loops | Performance instability at scale | Explicit budgets, throttling, and update intervals |
37
+ | `POST /api/run` shell ambiguity | Security and reliability risk | Contract requires orchestrator tool path; shell-out disallowed in default path |
38
+ | Missing API error contracts | Agent implementation ambiguity | Uniform response envelope and explicit error codes |
46
39
 
47
- New server-side data is accessed by adding:
40
+ ### 0.3 UX Principles (Mandatory)
48
41
 
49
- - New API routes under `packages/web-dashboard/src/app/api/`
50
- - New `callOrchestratorTool` invocations in `src/lib/orchestrator-tools.ts`
51
- - New fields on existing types or new types in `src/lib/types.ts`
52
- - New `aop-client.ts` file-read functions for artifacts not accessible via MCP tools
42
+ 1. **Task-first hierarchy:** surface what helps decisions first, details second.
43
+ 2. **Progressive disclosure:** default collapsed for deep diagnostics.
44
+ 3. **Truth over optimism:** inferred states must be labeled as inferred.
45
+ 4. **Accessibility by default:** keyboard complete, non-color semantics, ARIA labels.
46
+ 5. **Operational safety:** destructive or high-impact actions require guardrails and confirmation.
47
+ 6. **Determinism for agents:** every feature includes explicit data source, fallback, and testability.
53
48
 
54
- ---
55
-
56
- ## 1. Improvements
57
-
58
- ### UX-21 — Agent Pipeline Stepper
49
+ ### 0.4 Required Quality Standards (Mandatory)
59
50
 
60
- **What**: In the feature detail panel, render the planner → builder → QA pipeline as a visual three-step stepper showing the `role_status` for each agent role (ready / running / blocked / done).
51
+ All M45 implementations MUST satisfy:
61
52
 
62
- **Why it matters**: The most common question a developer has when monitoring a feature is "what is the agent doing right now?" The `role_status` object in `state.md` frontmatter answers this precisely — it has per-role status for planner, builder, and qa. Yet this is completely invisible in the current dashboard. The pipeline stepper turns a three-field object into an instantly readable execution snapshot.
53
+ - TypeScript strict mode passes.
54
+ - ESLint zero warnings.
55
+ - Coverage >= 90% lines/branches/functions/statements.
56
+ - `npm run build` passes for workspace and dashboard build target.
57
+ - Existing API contracts are preserved or versioned.
58
+ - Accessibility: WCAG 2.2 AA baseline for interactive components.
59
+ - No critical interaction depends on pointer-only behavior.
63
60
 
64
- **Specification**:
61
+ ### 0.5 Non-Functional Budgets (Mandatory)
65
62
 
66
- - Read `role_status.planner`, `role_status.builder`, `role_status.qa` from `FeatureSummary` (add these fields to the type).
67
- - Render as a horizontal stepper: `[Planner] [Builder] [QA]`.
68
- - Each step node has a status indicator:
69
- - `ready` grey circle (not started)
70
- - `running` amber animated pulse
71
- - `done` green checkmark
72
- - `blocked` → red X
73
- - The active (running) step label is bold; done steps use muted/strikethrough styling.
74
- - Show alongside the feature name in the detail panel header, above status badges.
75
- - Update live with every SSE snapshot (no additional API calls needed).
63
+ - SSE update handling SHOULD complete in < 50ms per snapshot for 200 features on a dev laptop baseline.
64
+ - No component may run `setInterval(1000)` unless required by human-visible countdown; prefer 5s cadence for low-value timers.
65
+ - Large lists/tables (>150 rows) MUST use virtualization or pagination.
66
+ - API errors MUST render non-blocking inline states (no white-screen failures).
67
+ - Responsive baseline is mandatory: dashboard core flows MUST be fully usable at **360px CSS width** in portrait orientation (phone-class viewport), including triage, feature selection, detail inspection, and review actions.
68
+ - At the phone baseline, no horizontal page-level scrolling is allowed for primary workflows; controls must remain reachable by keyboard and touch.
76
69
 
77
70
  ---
78
71
 
79
- ### UX-22 Per-Feature Cost & Token Tracker
80
-
81
- **What**: Show token usage and estimated USD cost for each feature in the detail panel, sourced from the existing `cost.get` MCP tool.
82
-
83
- **Why it matters**: AI-driven development has a direct monetary cost. Engineers and managers need to see what each feature costs — both to understand ROI and to catch runaway agents that have been retrying for hours. The `cost.get` tool already exposes `tokens_used` and `estimated_cost_usd` per feature. Making this visible closes the most obvious information gap for any team managing budgets.
72
+ ## 1. Experience Architecture
84
73
 
85
- **Specification**:
74
+ ### 1.1 Views and Information Scent
86
75
 
87
- - Add a new API route `GET /api/features/:id/cost` that calls `callOrchestratorTool('cost.get', { feature_id })`.
88
- - Fetch when a feature is selected in the detail panel.
89
- - In the detail panel, add a small "Cost" section: "Tokens: X,XXX | Est. Cost: $0.04".
90
- - Format cost with 2–4 decimal places; use "< $0.01" for very small values.
91
- - If cost data is unavailable (feature never ran), show "No cost data recorded."
92
- - In the summary bar (UX-01), add a "Total Cost Today" tile: sum `estimated_cost_usd` across all features with a `recorded_at` within the last 24 hours.
93
- - Add a `CostSummary` type to `src/lib/types.ts`:
94
- ```typescript
95
- export interface CostSummary {
96
- feature_id: string;
97
- tokens_used: number;
98
- estimated_cost_usd: number;
99
- recorded_at: string | null;
100
- }
101
- ```
76
+ M45 uses three intentional contexts:
102
77
 
103
- ---
78
+ 1. **Board (`/`)**
79
+ - For triage and workflow control.
80
+ - Shows columns, summary metrics, runtime health, lock/dependency summaries, and selected-feature detail panel.
81
+ 2. **Analytics (`/analytics`)**
82
+ - For trend and provider/collision insights.
83
+ - Keeps heavy tables/charts out of triage view.
84
+ 3. **Focus (`/feature/:id`)**
85
+ - For deep review of one feature.
86
+ - Full-width stacked detail sections.
104
87
 
105
- ### UX-23 — Plan Scope File Tree
88
+ ### 1.2 URL-State Contract
106
89
 
107
- **What**: In the detail panel's plan view, render `plan.files.create`, `plan.files.modify`, and `plan.files.delete` as a visual collapsible directory tree with color-coded operation type instead of flat lists.
90
+ To reduce user disorientation and allow shareable links:
108
91
 
109
- **Why it matters**: When reviewing a feature, understanding the _scope_ of changes matters as much as the content. A flat list of 40 file paths is hard to scan; a tree grouped by directory reveals the blast radius immediately. Color-coding (green = create, blue = modify, red = delete) maps directly to risk: red means deletion, which is the highest-risk operation.
92
+ - Board selection state MUST sync to URL query (`?feature=<id>&filters=...`).
93
+ - View mode MUST be route-driven (not only local state).
94
+ - Browser back/forward MUST restore selected feature + filters.
110
95
 
111
- **Specification**:
96
+ ### 1.3 Progressive Disclosure Rules
112
97
 
113
- - Parse the three arrays from `detail.plan.files` (or `{create: [], modify: [], delete: []}`).
114
- - Build a directory tree from path strings: split on `/`, build a nested object, render recursively.
115
- - Leaf nodes (files) are colored: green for create, blue for modify, red for delete.
116
- - Directory nodes show a count badge: "src/services/ (3 modified, 1 created)".
117
- - Tree is collapsed by default at depth > 2; clicking a directory node expands it.
118
- - `allowed_areas` and `forbidden_areas` from the plan are shown as a separate labeled list below the tree ("Allowed: src/services/**, Forbidden: src/core/**").
119
- - This replaces the simple file list in UX-04's plan viewer.
98
+ - Default expanded: summary, current phase status, review actions.
99
+ - Default collapsed: collision matrix, raw evidence, live feed details, revision metadata.
100
+ - In Focus view, reviewer-critical sections appear before diagnostics.
120
101
 
121
102
  ---
122
103
 
123
- ### UX-24 Acceptance Criteria Live Tracker
104
+ ## 2. Revised Improvements (UX-21 to UX-40)
124
105
 
125
- **What**: Render `plan.acceptance_criteria` as a live-status checklist in the detail panel, with each criterion's status inferred from available gate results and evidence artifacts.
106
+ ### UX-21 - Agent Pipeline Stepper
126
107
 
127
- **Why it matters**: The plan's acceptance criteria are the definition of done. They're declared by the planner agent and represent exactly what must be true for the feature to be complete. Yet they're never shown in the dashboard a reviewer has no way to know what the agent was supposed to achieve. Surfacing them with inferred completion status turns the criteria list into a live progress tracker.
108
+ **Intent:** reveal where a feature currently sits in planner -> builder -> qa.
128
109
 
129
- **Specification**:
110
+ **Requirements:**
130
111
 
131
- - Read `plan.acceptance_criteria` (array of strings) from `detail.plan`.
132
- - For each criterion, infer status using a lightweight heuristic:
133
- - If criterion text mentions "test" or "coverage" and the corresponding gate (`fast`/`full`) has `pass` status: mark as ✅ "Likely Met (gate passed)".
134
- - If the gate has `fail` status: mark as ❌ "Gate Failed".
135
- - If gate status is `na` or absent: mark as 🔲 "Unverified (requires review)".
136
- - Render as a checklist below the plan summary in the detail panel.
137
- - Each criterion shows its full text and a colored status badge.
138
- - A summary line: "3 / 5 criteria verifiable from gate results".
139
- - If no acceptance criteria exist, show "No acceptance criteria declared in plan."
112
+ - Source from `feature.role_status`.
113
+ - Render stepper with icon + text state: `Ready`, `Running`, `Blocked`, `Done`, `Unknown`.
114
+ - Running step uses motion + text, never motion-only.
115
+ - If role status is missing, render `Unknown` with neutral styling.
116
+ - Update via SSE snapshots; no extra API call.
140
117
 
141
- ---
118
+ ### UX-22 - Per-Feature Cost and Token Tracker
142
119
 
143
- ### UX-25 QA Test Coverage Map
120
+ **Intent:** make cost visibility immediate without noisy calls.
144
121
 
145
- **What**: Read `qa_test_index.json` for the selected feature and render a file-by-file test coverage status grid in the detail panel.
122
+ **Requirements:**
146
123
 
147
- **Why it matters**: The `qa_test_index.json` artifact maps each changed file to its required tests and current test status (`pending / running / passed / failed / waived`). This is precisely the information a developer needs to understand test coverage during QA. Currently it is an invisible file. Surfacing it as a visual grid (file path → status color) gives reviewers confidence that changed code is actually tested.
124
+ - API route: `GET /api/features/:id/cost`.
125
+ - Fetch only when feature detail is open; cache per feature for 60s.
126
+ - Use `AbortController` when switching selected feature.
127
+ - Render `Tokens`, `Estimated Cost`, and `Last Recorded`.
128
+ - Summary tile uses aggregated value from server payload (not N feature calls).
129
+ - Empty state: `No cost data recorded yet`.
148
130
 
149
- **Specification**:
131
+ ### UX-23 - Plan Scope File Tree
150
132
 
151
- - Add a new API route `GET /api/features/:id/test-index` that reads `.aop/features/:id/qa_test_index.json`.
152
- - Return type matches `QaTestIndex` type (add to `src/lib/types.ts`):
153
- ```typescript
154
- export interface QaTestIndexItem {
155
- path: string;
156
- status: 'pending' | 'running' | 'passed' | 'failed' | 'waived';
157
- required_tests: string[];
158
- last_run_at?: string;
159
- }
160
- export interface QaTestIndex {
161
- feature_id: string;
162
- version: number;
163
- items: QaTestIndexItem[];
164
- }
165
- ```
166
- - In the detail panel (shown for `qa` and `ready_to_merge` phases), add a "Test Coverage" section.
167
- - Render as a compact table: file path (truncated to 40 chars with tooltip), status pill (green = passed, red = failed, amber = pending/running, grey = waived), required test count.
168
- - Summary row: "N files: X passed, Y failed, Z pending."
169
- - If artifact missing: "No test index available."
133
+ **Intent:** communicate blast radius faster than flat lists.
170
134
 
171
- ---
135
+ **Requirements:**
172
136
 
173
- ### UX-26 Lock Resource Map
137
+ - Parse `plan.files.create|modify|delete` into tree model.
138
+ - Render operation badges (`Created`, `Modified`, `Deleted`) with icon + text.
139
+ - Collapse depth > 2 by default.
140
+ - For >200 nodes, use virtualization or chunked rendering.
141
+ - Show `allowed_areas` and `forbidden_areas` as separate policy chips.
174
142
 
175
- **What**: Add a "Locks" panel to the dashboard showing all resource locks from `index.lock_leases` with the current holder, time until lease expiry, and a stale indicator.
143
+ ### UX-24 - Acceptance Criteria Verification Signals
176
144
 
177
- **Why it matters**: Lock contention is the primary cause of features entering the blocked queue. Currently, developers have no way to see which features hold which locks without running `aop status` or reading raw JSON files. A live lock map makes contention immediately diagnosable — you can see at a glance that "openapi" is held by `feature-auth`, expires in 4 minutes, and `feature-payments` is waiting for it.
145
+ **Intent:** show progress signals without overstating certainty.
178
146
 
179
- **Specification**:
147
+ **Requirements:**
180
148
 
181
- - Extend `DashboardStatusPayload` (or add a new type) to include `lock_map`:
182
- ```typescript
183
- export interface LockLease {
184
- resource: string;
185
- holder: string | null;
186
- expires_at: string | null;
187
- is_stale: boolean;
188
- }
189
- ```
190
- - Extend `/api/status` or add `/api/locks` to return lock lease data read from `index.lock_leases`.
191
- - Compute `is_stale` server-side: `new Date(expires_at) < new Date()`.
192
- - Render as a small panel below or alongside the Kanban board (or as a tab in a future multi-panel layout, UX-40).
193
- - Each row: resource name, holder feature_id (as a clickable link to select that feature), expiry countdown ("expires in 4m 22s"), stale badge if stale.
194
- - Stale leases render in red with a "Stale" badge; healthy leases in green/neutral.
195
- - Countdown updates via `setInterval(1000)` from the `expires_at` timestamp (no server polling needed for the countdown).
149
+ - Rename section to **Verification Signals**.
150
+ - Never label inferred criteria as fully met.
151
+ - Status values:
152
+ - `Verified` (direct artifact evidence)
153
+ - `At Risk` (explicit gate/evidence failure)
154
+ - `Unverified` (insufficient evidence)
155
+ - Each item includes `reason` and `evidence_ref` when available.
156
+ - Summary line: `X verified, Y at risk, Z unverified`.
196
157
 
197
- ---
158
+ ### UX-25 - QA Test Coverage Map
198
159
 
199
- ### UX-27 Dependency Unblock Chain
160
+ **Intent:** expose test accountability per changed file.
200
161
 
201
- **What**: When features are in `dep_blocked` (waiting for dependency features to merge), visualize the dependency relationships as a linked chain showing which features are blocked and what they're waiting for.
162
+ **Requirements:**
202
163
 
203
- **Why it matters**: Dependency blocking is invisible in the current dashboard. `dep_blocked` entries are in `index.json` but never surfaced. A feature author has no way to see from the dashboard that their feature is waiting for `feature-auth` to merge. The dependency chain makes this explicit and shows the critical path.
164
+ - API route: `GET /api/features/:id/test-index`.
165
+ - Visible in `qa` and `ready_to_merge`.
166
+ - Table columns: file path, status, required test count, last run.
167
+ - Status labels must include icon + text.
168
+ - Filter pills: `failed`, `pending/running`, `waived`, `passed`.
169
+ - Empty state if artifact missing.
204
170
 
205
- **Specification**:
171
+ ### UX-26 - Lock Resource Map
206
172
 
207
- - Extend `FeaturesIndex` to include `dep_blocked`:
208
- ```typescript
209
- dep_blocked?: Array<{
210
- feature_id: string;
211
- depends_on_unresolved: string[];
212
- }>;
213
- ```
214
- - Features in `dep_blocked` render with a distinct "Awaiting Dependencies" sub-phase in their Kanban card.
215
- - In the detail panel, if `dep_blocked` contains the selected feature, show a "Dependencies" section listing unresolved dependencies as links.
216
- - At the bottom of the board (or in the collision queue area from UX-12), show a "Dependency Chains" section that renders the dep_blocked list as: `feature-payments → waiting for → [feature-auth]` where `feature-auth` is a clickable link.
217
- - When all dependencies for a feature are merged (list empties), the chain entry disappears on the next SSE snapshot.
173
+ **Intent:** make lock contention diagnosable in one glance.
218
174
 
219
- ---
175
+ **Requirements:**
220
176
 
221
- ### UX-28 Auto-Generated Review Brief Renderer
222
-
223
- **What**: When `.aop/features/<id>/review_brief.json` exists (generated by the M34-36 PRQ5 review brief service), render its structured sections in the detail panel as a rich, formatted review document.
224
-
225
- **Why it matters**: The M34-36 spec defines `review_brief.json` as a structured summary containing: intent summary, scope, contract risk, feasibility score, gate matrix, unresolved questions, and evidence references. This is designed to be the first thing a reviewer reads. Without dashboard rendering, reviewers must find and parse this JSON artifact manually. A rendered brief turns approval into a guided workflow.
226
-
227
- **Specification**:
228
-
229
- - Add API route `GET /api/features/:id/review-brief` that reads `.aop/features/:id/review_brief.json`.
230
- - Extend `FeatureDetail` to include `review_brief?: ReviewBrief | null`.
231
- - Add type:
232
- ```typescript
233
- export interface ReviewBrief {
234
- intent_summary: string;
235
- scope_summary: string;
236
- contract_risk_summary: string;
237
- feasibility_score?: number;
238
- feasibility_breakdown?: Record<string, number>;
239
- gate_matrix: Record<string, string>;
240
- unresolved_questions: string[];
241
- evidence_refs: string[];
242
- generated_at: string;
243
- }
244
- ```
245
- - In the detail panel for `ready_to_merge` features, show "Review Brief" as a prominently styled card at the top.
246
- - Render each section under a labeled header: Intent, Scope, Contract Risk, Feasibility (as a score with colored bar), Gate Results, Open Questions.
247
- - If `review_brief` is null/absent: show "Review brief not yet generated." with a note that it is auto-generated on gate completion.
248
- - When `feasibility_breakdown` is present, render each component (scope_realism, test_sufficiency, etc.) as a labelled progress bar.
177
+ - Extend status payload with normalized lock entries.
178
+ - Row fields: resource, holder, expires in, stale flag.
179
+ - Countdown refresh every 5s.
180
+ - Stale rows have text badge `Stale lease`.
181
+ - Clicking holder selects that feature.
249
182
 
250
- ---
183
+ ### UX-27 - Dependency Unblock Chain
251
184
 
252
- ### UX-29 Plan Risk Annotations
185
+ **Intent:** expose blocked critical path clearly.
253
186
 
254
- **What**: Render `plan.risk` (an array of risk notes declared by the planner) as prominent warning callouts in the detail panel.
187
+ **Requirements:**
255
188
 
256
- **Why it matters**: The planner agent is instructed to enumerate known risks and edge cases in the plan. This is valuable information — the agent knows the codebase and has identified "this touches authentication, test the token refresh path carefully." But these notes are buried in `plan.json` and never shown to the human reviewer. Surfacing them as visible amber callout boxes puts the planner's concerns directly in front of the person who needs to act on them.
189
+ - Source from `dep_blocked` in index/status payload.
190
+ - Feature card shows `Awaiting Dependencies` substate when applicable.
191
+ - Detail panel lists unresolved dependencies as navigable links.
192
+ - Board section `Dependency Chains` shows `feature -> depends on -> feature` rows.
193
+ - Rows disappear automatically on next snapshot when resolved.
257
194
 
258
- **Specification**:
195
+ ### UX-28 - Review Brief Renderer
259
196
 
260
- - Read `plan.risk` (array of strings) from `detail.plan`.
261
- - If `plan.risk` is non-empty, render a "Known Risks" section immediately above the review action buttons.
262
- - Each risk renders as an amber/yellow callout card with a ⚠️ prefix and the risk text.
263
- - Section has a header: "⚠ N Risk(s) Declared by Planner Agent".
264
- - If `plan.risk` is empty or absent, omit the section (no "no risks" message — absence implies clean).
265
- - Risk callouts are always visible (not collapsed) in the `ready_to_merge` phase; collapsed by default in earlier phases.
197
+ **Intent:** present generated review context as first-class reviewer input.
266
198
 
267
- ---
199
+ **Requirements:**
268
200
 
269
- ### UX-30 Gate Step Drill-Down
201
+ - API route: `GET /api/features/:id/review-brief`.
202
+ - Render in `ready_to_merge`; optional in `qa`.
203
+ - Sections: intent, scope, contract risk, feasibility, gate matrix, unresolved questions, evidence references.
204
+ - Show `generated_at` + freshness age.
205
+ - Missing brief state: `Review brief not generated yet`.
270
206
 
271
- **What**: When a gate fails, parse the gate evidence artifact (e.g., `.aop/features/<id>/evidence/fast.json`) to show the specific step that failed, its command, exit code, and the last N lines of output.
207
+ ### UX-29 - Plan Risk Annotations
272
208
 
273
- **Why it matters**: Seeing "fast: fail" tells a developer nothing actionable. What step failed? Was it `lint`, `build`, `test`? What was the error? The gate evidence JSON contains this information — it records per-step results. Showing "Step `test` failed: exit code 1. Last output: [...]" makes the failure immediately actionable without navigating to any other tool.
209
+ **Intent:** keep planner-declared risks visible at decision time.
274
210
 
275
- **Specification**:
211
+ **Requirements:**
276
212
 
277
- - Gate evidence artifacts are JSON files at `.aop/features/<id>/evidence/<mode>.json` or `-<mode>.json`.
278
- - Extend the evidence viewer (UX-07) to detect `.json` gate evidence files and render them as structured gate step results rather than raw JSON.
279
- - Parse: `{ steps: [{ name: string, cmd: string[], exit_code: number, stdout_tail: string, duration_ms: number }] }` (adapt to actual artifact structure).
280
- - Failed steps render in red with a collapsed `<details>` containing the `stdout_tail` (last 50 lines).
281
- - Passed steps render in green; skipped/timeout steps in amber.
282
- - Step duration shown as "Xs" beside each step.
283
- - If the artifact format doesn't match the expected gate result shape, fall back to the raw JSON viewer.
213
+ - Read from `plan.risk`.
214
+ - In `ready_to_merge`, section is expanded by default.
215
+ - Support optional severity prefix parsing (`[high]`, `[medium]`, `[low]`); default `medium`.
216
+ - Risks render as callouts with severity text badges.
284
217
 
285
- ---
218
+ ### UX-30 - Gate Step Drill-Down
286
219
 
287
- ### UX-31 Orchestrator Run Health Panel
220
+ **Intent:** shorten time-to-diagnosis for gate failures.
288
221
 
289
- **What**: Add a small but prominent "Orchestrator" health card showing the current run's provider, model, uptime, last heartbeat age, and a countdown to lease expiry.
222
+ **Requirements:**
290
223
 
291
- **Why it matters**: The `runtime_sessions` object in `index.json` contains: `provider`, `model`, `started_at`, `last_heartbeat_at`, `lease_expires_at`, `owner_instance_id`. If the orchestrator process dies, `last_heartbeat_at` stops updating and `lease_expires_at` approaches. Without this visibility, developers have no way to know from the dashboard that the orchestrator has crashed — they simply watch features stop making progress. A health card makes this obvious.
224
+ - Parse gate evidence JSON using tolerant parser with shape guards.
225
+ - Render per-step: name, command, status, exit code, duration.
226
+ - First failed step expands by default.
227
+ - Output tail is collapsible, capped to last 50 lines.
228
+ - Unknown artifact shape falls back to raw JSON viewer.
292
229
 
293
- **Specification**:
230
+ ### UX-31 - Orchestrator Run Health Panel
294
231
 
295
- - Extend `/api/status` response to include `runtime` field from `index.runtime_sessions`.
296
- - Render in the top-right of the header, replacing or supplementing the connection indicator dot.
297
- - Show: provider name, model name (abbreviated), run uptime ("running Xh Ym"), heartbeat health ("heartbeat: Ns ago").
298
- - Heartbeat indicator: green if `last_heartbeat_at` is < 30s ago; amber if 30s-2min; red if > 2min.
299
- - Lease expiry countdown: "lease expires in Xm Ys" — red if < 60s.
300
- - If `runtime_sessions` is absent (no active run): show "No active orchestrator run" with grey styling.
301
- - Clicking the health card expands a tooltip with full details: run_id, owner_instance_id, full timestamps.
232
+ **Intent:** expose run liveness and lease risk early.
302
233
 
303
- ---
234
+ **Requirements:**
304
235
 
305
- ### UX-32 Throughput Sparklines
236
+ - Include runtime session in status payload.
237
+ - Show provider, model, uptime, heartbeat age, lease countdown.
238
+ - Health states:
239
+ - `Healthy` (<30s heartbeat)
240
+ - `Degraded` (30s-2m)
241
+ - `Critical` (>2m)
242
+ - No active runtime: show `No active orchestrator run`.
306
243
 
307
- **What**: Display a mini sparkline chart in the summary bar (UX-01) or a dedicated analytics section showing features merged per day/week, derived from the merged features list and their `last_updated` timestamps.
244
+ ### UX-32 - Throughput Sparkline
308
245
 
309
- **Why it matters**: Throughput — how many features are completing successfully per unit time — is the primary output metric for any engineering team using AOP. Without it, there's no way to know if the system is working well or if throughput has degraded. A sparkline gives instant trend visibility without requiring a full analytics page.
246
+ **Intent:** provide fast trend check without analytics overload in Board view.
310
247
 
311
- **Specification**:
248
+ **Requirements:**
312
249
 
313
- - No new backend data is needed: use `payload.features` already available, filtered to `phase === 'merged'`, grouped by `last_updated` date.
314
- - Compute a rolling 14-day histogram (date → count of merges on that date).
315
- - Render as an SVG sparkline: 14 bars, one per day, height proportional to count.
316
- - Show below or beside the summary bar with label "Merge throughput (14d)" and the total count "N features merged."
317
- - Bars for today and yesterday use a highlighted color; older bars use muted color.
318
- - Tooltip on hover: "March 4: 3 features merged."
319
- - Implement using pure SVG (no charting library dependency).
250
+ - Compute 14-day merges histogram from merged feature timestamps.
251
+ - Render compact SVG sparkline with tooltips.
252
+ - Board shows tiny sparkline tile + link to full analytics route.
320
253
 
321
- ---
254
+ ### UX-33 - Provider Performance Analytics
322
255
 
323
- ### UX-33 Provider Performance Analytics Panel
256
+ **Intent:** support provider/model tradeoff decisions.
324
257
 
325
- **What**: Add a dedicated "Analytics" tab or section that renders `performance.get_analytics` data as a sortable table showing provider/model performance: success rate, avg cost, avg duration, avg retry count.
258
+ **Requirements:**
326
259
 
327
- **Why it matters**: The `performance.get_analytics` tool returns aggregated metrics per provider/model combination: `success_rate`, `avg_cost_usd`, `avg_duration_ms`, `avg_retry_count`, `total_features`. This data is extraordinarily valuable for teams choosing between Claude, Codex, Gemini, or custom providers — it answers "which model gives the best outcomes at the lowest cost?" with real data from your own runs.
260
+ - API route: `GET /api/analytics`.
261
+ - Place in `/analytics` route.
262
+ - Table columns: provider, model, features, success rate, avg cost, avg duration, avg retries.
263
+ - Sortable headers with accessible sort state labels.
264
+ - Include recent outcomes timeline (last 20).
265
+ - Refresh interval 30s, pause when tab hidden.
328
266
 
329
- **Specification**:
267
+ ### UX-34 - Collision Matrix Heatmap
330
268
 
331
- - Add API route `GET /api/analytics` that calls `callOrchestratorTool('performance.get_analytics', {})`.
332
- - Add a navigation tab or toggle: "Board" (default) | "Analytics" | "Locks" (see UX-40).
333
- - Analytics view renders two sections:
334
- **Provider Comparison Table**: one row per provider/model, columns: Provider, Model, Features, Success Rate (as %), Avg Cost, Avg Duration, Avg Retries. Sortable by any column. Success rate rendered as a colored pill (green ≥ 80%, amber 50–79%, red < 50%).
335
- **Recent Outcomes List**: the raw `outcomes` array as a timeline (last 20), showing feature_id, provider, model, status, gate_pass, cost_usd, duration (human-formatted).
336
- - Auto-refresh every 30s (less frequent than the main board polling).
337
- - Empty state: "No performance data recorded yet. Data accumulates as features complete."
269
+ **Intent:** make collision topology understandable.
338
270
 
339
- ---
340
-
341
- ### UX-34 — Collision Matrix Heatmap
342
-
343
- **What**: When multiple features have active plans, surface a collision matrix showing which features conflict with which others, and on which resources (files, areas, contracts).
344
-
345
- **Why it matters**: Collisions are the leading cause of blocked features. Currently the dashboard shows that a feature is blocked but gives no visual sense of the collision topology: which other feature is it conflicting with, on which files? The collision matrix answers this spatially — a quick scan shows if Feature A is blocking both Feature B and Feature C simultaneously, which indicates a high-priority merge situation.
346
-
347
- **Specification**:
271
+ **Requirements:**
348
272
 
349
- - Add API route `GET /api/collisions` that calls `callOrchestratorTool('collisions.scan', {})`.
350
- - The collision scan returns a matrix structure indicating which features conflict.
351
- - Render as a compact grid: features on both axes, cells colored by collision type:
352
- - 🔴 Red: file collision (highest priority)
353
- - 🟠 Orange: area collision
354
- - 🟡 Yellow: contract collision (openapi/events/db)
355
- - ⚪ Grey: no collision
356
- - Clicking a cell opens a tooltip: "Feature A × Feature B: 3 file conflicts in `src/api/`."
357
- - Only show the matrix when ≥ 2 active features exist; show a message otherwise.
358
- - Render in the "Locks" / "Collisions" panel (see UX-40 multi-view).
359
- - Use `dep_blocked` entries as a key to highlight which features are actively queued.
273
+ - API route: `GET /api/collisions`.
274
+ - Render matrix for <=50 active features.
275
+ - For >50 features, degrade to ranked collision pair list.
276
+ - Cell tooltip includes collision type and count.
277
+ - Use text legend + color legend.
360
278
 
361
- ---
279
+ ### UX-35 - Plan Revision History
362
280
 
363
- ### UX-35 Plan Revision History
281
+ **Intent:** provide context for scope churn.
364
282
 
365
- **What**: Show the full revision chain of a feature's plan in the detail panel, derived from `plan.plan_version`, `plan.revision_of`, and `plan.revision_reason`.
283
+ **Requirements:**
366
284
 
367
- **Why it matters**: A feature that is on plan version 4 with reasons like "revised: build failed due to missing dependency" and "revised: QA requested broader test coverage" tells a completely different story than a feature on version 1. Revision history reveals how difficult the work was, whether the agent was on track, and why scope changed. This context is critical for reviews and for understanding systemic patterns.
285
+ - Surface `plan_version`, `revision_of`, `revision_reason`.
286
+ - Display as a compact revision timeline card.
287
+ - If historical artifacts are not available, explicitly label: `Only current plan metadata is retained`.
368
288
 
369
- **Specification**:
289
+ ### UX-36 - Live Cross-Feature Event Feed
370
290
 
371
- - The current `plan.json` contains `plan_version`, and optionally `revision_of` and `revision_reason` for each version.
372
- - Note: previous plan versions are not retained on disk in the current architecture (only current plan.json exists). This improvement should expose what _is_ available: `plan_version` number, `revision_of` (which version it supersedes), and `revision_reason`.
373
- - In the plan viewer section, show: "Plan v{N}" as the header, with `revision_reason` if present.
374
- - If `plan_version > 1`, add a "This plan was revised from v{revision_of}" annotation with the revision reason.
375
- - If `revision_reason` is absent but `plan_version > 1`, show "Plan revised N time(s) — no reason recorded."
376
- - This is a light-touch display of already-available data; no new artifact storage is required.
291
+ **Intent:** provide stream-level observability without noise overload.
377
292
 
378
- ---
293
+ **Requirements:**
379
294
 
380
- ### UX-36 Live Cross-Feature Event Feed
295
+ - Compute feed events by diffing SSE snapshots.
296
+ - Event types: phase, gate, activity, pr, lock.
297
+ - De-duplicate repeated identical events within 10s window.
298
+ - Keep last 100 events.
299
+ - Provide pause/resume and clear actions.
300
+ - Collapsed by default.
381
301
 
382
- **What**: Add a real-time event feed panel that shows a rolling log of state-change events across all features as they arrive via SSE — a "tail -f" view of your entire orchestration system.
302
+ ### UX-37 - Quick Launch Panel
383
303
 
384
- **Why it matters**: The Kanban board is a snapshot view. Developers monitoring a multi-feature run often want streaming visibility: "What just happened? Did something just fail? Did the QA agent just finish?" The SSE stream already delivers state snapshots every 2 seconds. Diffing consecutive snapshots surfaces discrete change events that can be rendered as a feed — "feature-auth transitioned building qa" or "feature-payments gate fast: pass."
304
+ **Intent:** provide in-dashboard feature launch for authorized operators in M45.
385
305
 
386
- **Specification**:
306
+ **Requirements:**
387
307
 
388
- - Client-side only: diff the current `DashboardStatusPayload` against the previous one on each SSE snapshot event.
389
- - Detect changes: phase transitions, gate status changes, activity_state changes, PR status changes.
390
- - Emit `FeedEvent` objects:
391
- ```typescript
392
- interface FeedEvent {
393
- id: string;
394
- timestamp: string;
395
- feature_id: string;
396
- type: 'phase_change' | 'gate_update' | 'activity_change' | 'pr_update' | 'lock_change';
397
- message: string;
398
- severity: 'info' | 'success' | 'warning' | 'error';
399
- }
400
- ```
401
- - Render as a fixed-height (200px) scrollable panel at the bottom of the board, labeled "Live Events."
402
- - New events animate in at the top (prepend); keep last 100 events in memory.
403
- - Color-code by severity: green = success, red = error/fail, amber = warning, grey = info.
404
- - Events are clickable: clicking selects the associated feature for detail view.
405
- - A pause button stops auto-scrolling while reading.
406
- - This panel is collapsed by default; toggled via a small "Live Feed" toggle button.
308
+ - Feature flagged: `dashboard.quick_launch`.
309
+ - API route: `POST /api/run`.
310
+ - Default path MUST call orchestrator tools (not shelling out).
311
+ - Validate `feature_id` pattern and required goal.
312
+ - Require explicit confirmation before launch.
313
+ - If disabled by policy or permission, hide control and return authorization error on direct call.
407
314
 
408
- ---
315
+ ### UX-38 - Flaky Test Indicator
409
316
 
410
- ### UX-37 Quick Launch Panel
317
+ **Intent:** distinguish probable flaky failures from likely regressions.
411
318
 
412
- **What**: Add a "New Feature" button in the dashboard header that opens a modal form for starting a new feature run (`aop run`) directly from the UI, without requiring CLI access.
319
+ **Requirements:**
413
320
 
414
- **Why it matters**: Every workflow involving AOP today starts at the CLI. A developer monitoring the dashboard must context-switch to a terminal to start the next feature. The quick launch panel makes the dashboard the home base for the full AOP workflow — start, monitor, review, approve — without leaving the browser.
321
+ - API route: `GET /api/flaky`.
322
+ - If data unavailable, omit indicator without error toast.
323
+ - Show indicator only when failing required tests overlap flaky suspects.
324
+ - Label as probabilistic signal, not definitive cause.
415
325
 
416
- **Specification**:
326
+ ### UX-39 - Feature Budget Meter
417
327
 
418
- - A "+ New Feature" button in the dashboard header (top-right area).
419
- - Opens a modal form with fields:
420
- - **Feature ID** (text input, validated against `^[a-z0-9_][a-z0-9_-]*$` pattern from `plan.schema.json`)
421
- - **Goal / Description** (textarea, maps to `--goal` flag)
422
- - **Provider** (optional select: codex / claude / gemini / custom / — use default)
423
- - **Gate Profile** (optional select populated from `gates.list` tool response)
424
- - "Start Feature" button calls a new API endpoint `POST /api/run` that invokes the CLI programmatically or calls the appropriate orchestrator tool.
425
- - On success: close modal, show toast "Feature {id} started", feature appears in Planning column on next SSE update.
426
- - On error: show error in modal.
427
- - Implementation note: the `POST /api/run` endpoint shells out to `aop run` or calls `feature.init` + `plan.submit` via `callOrchestratorTool`. The exact mechanism depends on whether a live supervisor is running.
328
+ **Intent:** prevent late surprise budget blocks.
428
329
 
429
- ---
330
+ **Requirements:**
430
331
 
431
- ### UX-38 Flaky Test Indicator
332
+ - API route: `GET /api/policy/budget`.
333
+ - Meter compares `estimated_cost_usd` to per-feature limit when configured.
334
+ - States: `Normal` (<60%), `Warning` (60-85%), `Critical` (>85%), `Exceeded`.
335
+ - `paused_budget` phase/status must show explicit `Budget exhausted` banner.
432
336
 
433
- **What**: When the flaky gate intelligence system (M34-36 PRQ4) has quarantine data, surface a "known flaky" indicator on QA-phase feature cards and in the detail panel.
337
+ ### UX-40 - Multi-View Layout Toggle (Route-Native)
434
338
 
435
- **Why it matters**: One of the most frustrating experiences in CI/CD is a failed gate where the failure is a known flaky test. Without flaky awareness in the dashboard, a reviewer sees "gate: fail" and spends time investigating what turns out to be a pre-existing flakiness problem, not a regression introduced by the feature. Flaky indicators prevent this wasted investigation cycle.
339
+ **Intent:** support triage and deep review without cramped hybrid layouts.
436
340
 
437
- **Specification**:
341
+ **Requirements:**
438
342
 
439
- - Add API route `GET /api/flaky` that calls `callOrchestratorTool('gate.flaky_report_get', {})` (when M34-36 is implemented) or reads the flaky report artifact from `.aop/flaky-report.json` if available.
440
- - If the flaky tool/artifact is not available, omit the indicator gracefully (no error state shown).
441
- - Add to `DashboardStatusPayload`: `flaky_suspects: string[]` (list of test keys with flaky risk).
442
- - On feature cards in `qa` phase: if the feature's gate results include a failure AND any `required_tests` from its test index overlap with `flaky_suspects`, show a small amber badge "⚠ Known flaky tests may be involved."
443
- - In the detail panel, add a "Flaky Test Risk" section: list affected test names, their flaky probability, quarantine status, and expiry.
444
- - Quarantined tests (actively suppressed from blocking merge) are shown with a distinct "Quarantined" badge.
343
+ - Views are route-native:
344
+ - Board: `/`
345
+ - List: `/?view=list`
346
+ - Focus: `/feature/:id`
347
+ - List view supports sortable columns and keyboard row navigation.
348
+ - Right-click-only interactions are prohibited; action menu button required.
349
+ - Focus view includes full-width review sections with sticky action footer.
445
350
 
446
351
  ---
447
352
 
448
- ### UX-39 Feature Budget Meter
449
-
450
- **What**: For each feature, show a visual budget consumption meter comparing `cost.estimated_cost_usd` against the configured per-feature budget threshold from `policy.yaml`.
451
-
452
- **Why it matters**: Policy-level budget controls (`budget.per_feature_usd_limit` or similar) exist to prevent runaway agent costs. But without a visual indicator, developers don't know how close a feature is to exhausting its budget until it hits `paused_budget` status (which renders as blocked in the current board). A budget meter — like a fuel gauge — shows the consumption trajectory before it becomes a problem.
453
-
454
- **Specification**:
455
-
456
- - Add a new API route `GET /api/policy/budget` that reads the relevant budget fields from the composed policy (`policy.budget.per_feature_usd_limit` or equivalent).
457
- - On the feature detail panel, when cost data is available, show a progress bar: "Budget Usage: $0.04 / $2.00 (2%)".
458
- - Bar color: green < 60%, amber 60–85%, red > 85%.
459
- - If no budget limit is configured in policy, show the raw cost without a comparative bar.
460
- - On feature cards: show a small budget indicator dot (using the same color scheme) when budget usage > 60%.
461
- - Features at `paused_budget` status show "Budget exhausted" instead of a percentage.
462
-
463
- ---
464
-
465
- ### UX-40 — Multi-View Layout Toggle
466
-
467
- **What**: Add a layout toggle that switches the dashboard between three views: **Board** (current Kanban layout), **List** (sortable table of all features), and **Focus** (single-column full-width view of a selected feature with maximum detail).
468
-
469
- **Why it matters**: Kanban is the right view when monitoring many features simultaneously. But it is a poor format for a reviewer who wants to deeply inspect one feature — the 1/3-width detail panel in a 2:1 grid is cramped. A **Focus** view gives a single feature the entire screen, rendering all detail sections (pipeline stepper, plan, diff, QA index, review brief, acceptance criteria, risk notes, cost) in a single scrolling document. A **List** view is better for operations tasks (sorting by cost, filtering by status, bulk review).
353
+ ## 3. API Contracts (Revised)
470
354
 
471
- **Specification**:
355
+ ### 3.1 Uniform Response Envelope (Mandatory)
472
356
 
473
- **Board View** (default): existing Kanban grid from M44.
357
+ All new dashboard routes MUST return:
474
358
 
475
- **List View**:
476
-
477
- - Table with columns: Feature ID, Phase, Activity, Gates (fast/full/merge as icons), PR CI, Tokens, Cost, Last Updated (relative).
478
- - Sortable by any column (click header to sort ascending/descending).
479
- - Row click selects feature and opens a compact detail drawer (not full detail panel).
480
- - Row right-click opens a context menu with quick actions (approve, deny, request changes).
481
-
482
- **Focus View** (single-feature deep dive):
483
-
484
- - Full-page single-feature view.
485
- - Left column (60%): diff viewer (UX-05), plan scope tree (UX-23), QA test coverage map (UX-25).
486
- - Right column (40%): review brief (UX-28), acceptance criteria (UX-24), risk notes (UX-29), gate drill-down (UX-30), cost meter (UX-39), agent pipeline (UX-21).
487
- - Review action buttons fixed at the bottom of the right column.
488
- - "← Back to Board" button returns to the previous view.
489
- - Triggered by: clicking "Open in Focus" in the Kanban detail panel, or clicking a feature ID in the List view while holding Cmd/Ctrl.
490
-
491
- **Implementation**:
492
-
493
- - View state managed as `'board' | 'list' | 'focus'` in React state.
494
- - Toggle rendered as a three-button toggle group in the header bar.
495
- - Each view is a separate component in `src/components/views/`.
496
-
497
- ---
498
-
499
- ## 2. Implementation Priorities
500
-
501
- ### 2.1 Priority Tiers
502
-
503
- **Tier 1 — Immediately Valuable (no spec dependencies):**
504
-
505
- - UX-21: Agent Pipeline Stepper — data already in state frontmatter
506
- - UX-22: Cost & Token Tracker — cost.get tool already exists
507
- - UX-29: Plan Risk Annotations — plan.risk already in plan.json
508
- - UX-31: Orchestrator Run Health Panel — runtime_sessions already in index.json
509
- - UX-35: Plan Revision History — plan_version/revision_of already in plan.json
510
-
511
- **Tier 2 — High Impact, Requires New Data Wiring:**
512
-
513
- - UX-23: Plan Scope File Tree — plan.files already in plan.json, needs parsing
514
- - UX-24: Acceptance Criteria Live Tracker — plan.acceptance_criteria already available
515
- - UX-25: QA Test Coverage Map — requires reading qa_test_index.json
516
- - UX-26: Lock Resource Map — requires reading lock_leases from index.json
517
- - UX-27: Dependency Unblock Chain — dep_blocked in index.json
518
- - UX-32: Throughput Sparklines — derived from existing feature data
519
- - UX-36: Live Cross-Feature Event Feed — client-side diff of SSE data only
520
-
521
- **Tier 3 — Requires New API Routes or Upstream Specs:**
522
-
523
- - UX-28: Review Brief Renderer — depends on M34-36 PRQ5 artifact existing
524
- - UX-30: Gate Step Drill-Down — depends on gate evidence JSON structure
525
- - UX-33: Provider Performance Analytics — requires performance.get_analytics call
526
- - UX-34: Collision Matrix Heatmap — requires collisions.scan call
527
- - UX-37: Quick Launch Panel — requires `POST /api/run` implementation
528
- - UX-38: Flaky Test Indicator — depends on M34-36 PRQ4 flaky data
529
- - UX-39: Feature Budget Meter — requires policy budget config surfacing
530
- - UX-40: Multi-View Layout Toggle — large component, depends on UX-19 completion
531
-
532
- ---
533
-
534
- ## 3. New API Routes Required
359
+ - Success:
360
+ ```json
361
+ { "ok": true, "data": {}, "meta": { "stale": false, "source": "tool|artifact|derived" } }
362
+ ```
363
+ - Error:
364
+ ```json
365
+ { "ok": false, "error": { "code": "string", "message": "string", "retryable": false } }
366
+ ```
535
367
 
536
- | Route | Method | Purpose | Spec |
537
- | -------------------------------- | ------ | --------------------------------------- | ----- |
538
- | `/api/features/:id/cost` | GET | Return cost.get result via MCP tool | UX-22 |
539
- | `/api/features/:id/test-index` | GET | Read `qa_test_index.json` from disk | UX-25 |
540
- | `/api/features/:id/review-brief` | GET | Read `review_brief.json` from disk | UX-28 |
541
- | `/api/analytics` | GET | Return performance.get_analytics result | UX-33 |
542
- | `/api/collisions` | GET | Return collisions.scan result | UX-34 |
543
- | `/api/flaky` | GET | Return gate.flaky_report_get result | UX-38 |
544
- | `/api/policy/budget` | GET | Return budget policy fields | UX-39 |
545
- | `/api/run` | POST | Launch a new feature run | UX-37 |
368
+ ### 3.2 Required Routes
369
+
370
+ | Route | Method | Purpose | UX |
371
+ | -------------------------------- | ------ | ----------------------------------- | ----- |
372
+ | `/api/features/:id/cost` | GET | Per-feature cost and tokens | UX-22 |
373
+ | `/api/features/:id/test-index` | GET | Per-feature QA test index | UX-25 |
374
+ | `/api/features/:id/review-brief` | GET | Renderable review brief | UX-28 |
375
+ | `/api/analytics` | GET | Provider/model analytics | UX-33 |
376
+ | `/api/collisions` | GET | Collision matrix or ranked list | UX-34 |
377
+ | `/api/flaky` | GET | Flaky suspects/quarantine metadata | UX-38 |
378
+ | `/api/policy/budget` | GET | Budget policy thresholds | UX-39 |
379
+ | `/api/run` | POST | Quick launch entry point (in-scope) | UX-37 |
380
+
381
+ ### 3.3 API Error Codes (Minimum Set)
382
+
383
+ - `feature_not_found`
384
+ - `artifact_missing`
385
+ - `tool_unavailable`
386
+ - `tool_timeout`
387
+ - `policy_not_configured`
388
+ - `unauthorized_action`
389
+ - `invalid_input`
546
390
 
547
391
  ---
548
392
 
549
- ## 4. New Type Additions to `src/lib/types.ts`
393
+ ## 4. Type Additions (`packages/web-dashboard/src/lib/types.ts`)
550
394
 
551
395
  ```typescript
552
- // UX-21
553
- export type RoleStatus = 'ready' | 'running' | 'blocked' | 'done';
396
+ export type RoleStatus = 'ready' | 'running' | 'blocked' | 'done' | 'unknown';
397
+
554
398
  export interface AgentPipelineStatus {
555
399
  planner: RoleStatus;
556
400
  builder: RoleStatus;
557
401
  qa: RoleStatus;
558
402
  }
559
403
 
560
- // UX-22
561
404
  export interface CostSummary {
562
405
  feature_id: string;
563
406
  tokens_used: number;
@@ -565,20 +408,19 @@ export interface CostSummary {
565
408
  recorded_at: string | null;
566
409
  }
567
410
 
568
- // UX-25
569
411
  export interface QaTestIndexItem {
570
412
  path: string;
571
413
  status: 'pending' | 'running' | 'passed' | 'failed' | 'waived';
572
414
  required_tests: string[];
573
415
  last_run_at?: string;
574
416
  }
417
+
575
418
  export interface QaTestIndex {
576
419
  feature_id: string;
577
420
  version: number;
578
421
  items: QaTestIndexItem[];
579
422
  }
580
423
 
581
- // UX-26
582
424
  export interface LockLease {
583
425
  resource: string;
584
426
  holder: string | null;
@@ -586,7 +428,6 @@ export interface LockLease {
586
428
  is_stale: boolean;
587
429
  }
588
430
 
589
- // UX-28
590
431
  export interface ReviewBrief {
591
432
  intent_summary: string;
592
433
  scope_summary: string;
@@ -599,7 +440,6 @@ export interface ReviewBrief {
599
440
  generated_at: string;
600
441
  }
601
442
 
602
- // UX-33
603
443
  export interface ProviderAnalytics {
604
444
  provider: string;
605
445
  model: string;
@@ -611,7 +451,6 @@ export interface ProviderAnalytics {
611
451
  avg_cost_usd: number;
612
452
  }
613
453
 
614
- // UX-36
615
454
  export interface FeedEvent {
616
455
  id: string;
617
456
  timestamp: string;
@@ -622,39 +461,94 @@ export interface FeedEvent {
622
461
  }
623
462
  ```
624
463
 
625
- Additionally, extend `FeatureSummary` to include:
464
+ `FeatureSummary` additions:
626
465
 
627
- - `role_status?: AgentPipelineStatus` (UX-21)
628
- - `gate_retry_count?: number` (UX-30)
629
- - `last_retry_at?: string | null` (UX-30)
466
+ - `role_status?: AgentPipelineStatus`
467
+ - `gate_retry_count?: number`
468
+ - `last_retry_at?: string | null`
630
469
 
631
- And extend `DashboardStatusPayload` to include:
470
+ `DashboardStatusPayload` additions:
632
471
 
633
- - `runtime?: RuntimeSession` (UX-31)
634
- - `lock_map?: LockLease[]` (UX-26)
635
- - `dep_blocked?: Array<{ feature_id: string; depends_on_unresolved: string[] }>` (UX-27)
636
- - `flaky_suspects?: string[]` (UX-38)
472
+ - `runtime?: RuntimeSession`
473
+ - `lock_map?: LockLease[]`
474
+ - `dep_blocked?: Array<{ feature_id: string; depends_on_unresolved: string[] }>`
475
+ - `flaky_suspects?: string[]`
476
+ - `metrics?: { total_cost_today_usd?: number; merge_histogram_14d?: number[] }`
637
477
 
638
478
  ---
639
479
 
640
- ## 5. Acceptance Criteria
480
+ ## 5. Agentic Implementation Plan
481
+
482
+ ### 5.1 Delivery Slices
483
+
484
+ Execution priority for M45 is explicitly reviewer-first, then operator flow.
485
+
486
+ 1. **Slice A - Reviewer Core**
487
+ - UX-21, UX-23, UX-24, UX-29, UX-30, UX-35
488
+ 2. **Slice B - Operator Core**
489
+ - UX-26, UX-27, UX-31, UX-36, UX-39
490
+ 3. **Slice C - Analytics**
491
+ - UX-32, UX-33, UX-34, UX-38
492
+ 4. **Slice D - Launch + View Refinement**
493
+ - UX-37, UX-40, List/Focus routing polish
494
+
495
+ ### 5.2 Required File Targets
496
+
497
+ - `packages/web-dashboard/src/app/page.tsx`
498
+ - `packages/web-dashboard/src/app/analytics/page.tsx` (new)
499
+ - `packages/web-dashboard/src/app/feature/[id]/page.tsx` (new)
500
+ - `packages/web-dashboard/src/components/*` (new/updated per feature)
501
+ - `packages/web-dashboard/src/lib/types.ts`
502
+ - `packages/web-dashboard/src/lib/dashboard-utils.ts`
503
+ - `packages/web-dashboard/src/lib/aop-client.ts`
504
+ - `packages/web-dashboard/src/lib/orchestrator-tools.ts`
505
+ - `packages/web-dashboard/src/app/api/**/route.ts` (new routes above)
506
+ - `apps/control-plane/test/dashboard-*.spec.ts` (API + utils + interaction tests)
507
+
508
+ ### 5.3 Deterministic Fallback Rules
509
+
510
+ - Missing artifact: return `ok: false`, `error.code = artifact_missing` and render empty state.
511
+ - Missing tool support: `tool_unavailable`; UI renders feature section disabled with explanation.
512
+ - Any optional section failure MUST NOT block board rendering.
513
+
514
+ ### 5.4 Security and Safety Rules
515
+
516
+ - `POST /api/run` MUST validate inputs server-side, regardless of client validation.
517
+ - Quick launch MUST honor policy + permission checks.
518
+ - Role-based authorization enforcement is deferred for M45, but route handlers MUST call a pluggable authorization adapter interface so RBAC can be enabled without route rewrites in a later milestone.
519
+ - Action confirmations required for launch/approve/deny/request changes.
520
+
521
+ ---
522
+
523
+ ## 6. Acceptance Criteria
524
+
525
+ M45 is accepted only when all are true:
526
+
527
+ 1. Functional criteria
528
+ - UX-21..UX-40 implemented according to this spec and degradations defined.
529
+ 2. Accessibility criteria
530
+ - Keyboard-only flow covers triage, feature selection, view switch, and actions.
531
+ - No status semantics rely on color alone.
532
+ 3. Reliability criteria
533
+ - Optional section failures do not break board rendering.
534
+ - API routes return uniform response envelope.
535
+ 4. Performance criteria
536
+ - No continuous high-frequency loops without justification.
537
+ - Event feed de-duplication and bounded memory confirmed.
538
+ 5. Engineering criteria
539
+ - `npm run lint`, `npm run typecheck`, `npm test`, `npm run build`, `npm run validate:mcp-contracts`, `npm run validate:architecture` all pass.
540
+
541
+ ---
641
542
 
642
- Each improvement is accepted when:
543
+ ## 7. Out of Scope
643
544
 
644
- 1. It renders correctly in development mode (`npm run dev`).
645
- 2. `npm run build` succeeds with no TypeScript or lint errors.
646
- 3. Components that can be unit-tested have corresponding test files.
647
- 4. Data that is unavailable (optional fields, features not yet run, specs not yet implemented) degrades gracefully with an informative empty/unavailable state — no crashes, no unhandled promise rejections.
648
- 5. No new third-party runtime dependencies introduced without explicit justification. SVG charts, diff parsing, tree building — all implemented locally.
545
+ - In-dashboard code editing or patch authoring.
546
+ - Full mobile-first redesign beyond the mandatory 360px phone baseline.
547
+ - Websocket migration (SSE remains transport for M45).
548
+ - External alerting integrations (Slack/PagerDuty/webhooks).
649
549
 
650
550
  ---
651
551
 
652
- ## 6. Out of Scope
552
+ ## 8. Open Questions (Need Product/Governance Input)
653
553
 
654
- - Real-time WebSocket upgrade (SSE polling at 2s is sufficient for this spec).
655
- - Server-side rendered pages for individual features (URL routing per feature).
656
- - In-dashboard code editing or patch application.
657
- - Notification webhooks or external alerting integration (belongs in M37 or a separate alerting spec).
658
- - AI-powered natural language queries over dashboard data.
659
- - Dark/light theme switching.
660
- - Mobile layout optimization.
554
+ None for this revision.