npm - @exaudeus/workrail - Versions diffs - 3.27.0 → 3.29.0 - Mend

@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (160) hide show

package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
package/dist/console/index.html +1 -1
package/dist/manifest.json +3 -3
package/docs/README.md +57 -0
package/docs/adrs/001-hybrid-storage-backend.md +38 -0
package/docs/adrs/002-four-layer-context-classification.md +38 -0
package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
package/docs/adrs/010-release-pipeline.md +89 -0
package/docs/architecture/README.md +7 -0
package/docs/architecture/refactor-audit.md +364 -0
package/docs/authoring-v2.md +527 -0
package/docs/authoring.md +873 -0
package/docs/changelog-recent.md +201 -0
package/docs/configuration.md +505 -0
package/docs/ctc-mcp-proposal.md +518 -0
package/docs/design/README.md +22 -0
package/docs/design/agent-cascade-protocol.md +96 -0
package/docs/design/autonomous-console-design-candidates.md +253 -0
package/docs/design/autonomous-console-design-review.md +111 -0
package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
package/docs/design/claude-code-source-deep-dive.md +713 -0
package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
package/docs/design/console-execution-trace-candidates-final.md +160 -0
package/docs/design/console-execution-trace-candidates.md +211 -0
package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
package/docs/design/console-execution-trace-design-review.md +74 -0
package/docs/design/console-execution-trace-discovery.md +394 -0
package/docs/design/console-execution-trace-final-review.md +77 -0
package/docs/design/console-execution-trace-review.md +92 -0
package/docs/design/console-performance-discovery.md +415 -0
package/docs/design/console-ui-backlog.md +280 -0
package/docs/design/daemon-architecture-discovery.md +853 -0
package/docs/design/daemon-design-candidates.md +318 -0
package/docs/design/daemon-design-review-findings.md +119 -0
package/docs/design/daemon-engine-design-candidates.md +210 -0
package/docs/design/daemon-engine-design-review.md +131 -0
package/docs/design/daemon-execution-engine-discovery.md +280 -0
package/docs/design/daemon-gap-analysis.md +554 -0
package/docs/design/daemon-owns-console-plan.md +168 -0
package/docs/design/daemon-owns-console-review.md +91 -0
package/docs/design/daemon-owns-console.md +195 -0
package/docs/design/data-model-erd.md +11 -0
package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
package/docs/design/list-workflows-latency-fix-plan.md +128 -0
package/docs/design/list-workflows-latency-fix-review.md +55 -0
package/docs/design/list-workflows-latency-fix.md +109 -0
package/docs/design/native-context-management-api.md +11 -0
package/docs/design/performance-sweep-2026-04.md +96 -0
package/docs/design/routines-guide.md +219 -0
package/docs/design/sequence-diagrams.md +11 -0
package/docs/design/subagent-design-principles.md +220 -0
package/docs/design/temporal-patterns-design-candidates.md +312 -0
package/docs/design/temporal-patterns-design-review-findings.md +163 -0
package/docs/design/test-isolation-from-config-file.md +335 -0
package/docs/design/v2-core-design-locks.md +2746 -0
package/docs/design/v2-lock-registry.json +734 -0
package/docs/design/workflow-authoring-v2.md +1044 -0
package/docs/design/workflow-docs-spec.md +218 -0
package/docs/design/workflow-extension-points.md +687 -0
package/docs/design/workrail-auto-trigger-system.md +359 -0
package/docs/design/workrail-config-file-discovery.md +513 -0
package/docs/docker.md +110 -0
package/docs/generated/v2-lock-closure-plan.md +26 -0
package/docs/generated/v2-lock-coverage.json +797 -0
package/docs/generated/v2-lock-coverage.md +177 -0
package/docs/ideas/backlog.md +3927 -0
package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
package/docs/ideas/implementation_plan.md +249 -0
package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
package/docs/implementation/02-architecture.md +316 -0
package/docs/implementation/04-testing-strategy.md +124 -0
package/docs/implementation/09-simple-workflow-guide.md +835 -0
package/docs/implementation/13-advanced-validation-guide.md +874 -0
package/docs/implementation/README.md +21 -0
package/docs/integrations/claude-code.md +300 -0
package/docs/integrations/firebender.md +315 -0
package/docs/migration/v0.1.0.md +147 -0
package/docs/naming-conventions.md +45 -0
package/docs/planning/README.md +104 -0
package/docs/planning/github-ticketing-playbook.md +195 -0
package/docs/plans/README.md +24 -0
package/docs/plans/agent-managed-ticketing-design.md +605 -0
package/docs/plans/agentic-orchestration-roadmap.md +112 -0
package/docs/plans/assessment-gates-engine-handoff.md +536 -0
package/docs/plans/content-coherence-and-references.md +151 -0
package/docs/plans/library-extraction-plan.md +340 -0
package/docs/plans/mr-review-workflow-redesign.md +1451 -0
package/docs/plans/native-context-management-epic.md +11 -0
package/docs/plans/perf-fixes-design-candidates.md +225 -0
package/docs/plans/perf-fixes-design-review-findings.md +61 -0
package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
package/docs/plans/perf-fixes-new-issues-review.md +110 -0
package/docs/plans/prompt-fragments.md +53 -0
package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
package/docs/plans/ui-ux-workflow-discovery.md +100 -0
package/docs/plans/ui-ux-workflow-review.md +48 -0
package/docs/plans/v2-followup-enhancements.md +587 -0
package/docs/plans/workflow-categories-candidates.md +105 -0
package/docs/plans/workflow-categories-discovery.md +110 -0
package/docs/plans/workflow-categories-review.md +51 -0
package/docs/plans/workflow-discovery-model-candidates.md +94 -0
package/docs/plans/workflow-discovery-model-discovery.md +74 -0
package/docs/plans/workflow-discovery-model-review.md +48 -0
package/docs/plans/workflow-source-setup-phase-1.md +245 -0
package/docs/plans/workflow-source-setup-phase-2.md +361 -0
package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
package/docs/plans/workflow-staleness-detection-review.md +58 -0
package/docs/plans/workflow-staleness-detection.md +80 -0
package/docs/plans/workflow-v2-design.md +69 -0
package/docs/plans/workflow-v2-roadmap.md +74 -0
package/docs/plans/workflow-validation-design.md +98 -0
package/docs/plans/workflow-validation-roadmap.md +108 -0
package/docs/plans/workrail-platform-vision.md +420 -0
package/docs/reference/agent-context-cleaner-snippet.md +94 -0
package/docs/reference/agent-context-guidance.md +140 -0
package/docs/reference/context-optimization.md +284 -0
package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
package/docs/reference/example-workflow-repository-template/README.md +268 -0
package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
package/docs/reference/external-workflow-repositories.md +916 -0
package/docs/reference/feature-flags-architecture.md +472 -0
package/docs/reference/feature-flags.md +349 -0
package/docs/reference/god-tier-workflow-validation.md +272 -0
package/docs/reference/loop-optimization.md +209 -0
package/docs/reference/loop-validation.md +176 -0
package/docs/reference/loops.md +465 -0
package/docs/reference/mcp-platform-constraints.md +59 -0
package/docs/reference/recovery.md +88 -0
package/docs/reference/releases.md +177 -0
package/docs/reference/troubleshooting.md +105 -0
package/docs/reference/workflow-execution-contract.md +998 -0
package/docs/roadmap/README.md +22 -0
package/docs/roadmap/legacy-planning-status.md +103 -0
package/docs/roadmap/now-next-later.md +70 -0
package/docs/roadmap/open-work-inventory.md +389 -0
package/docs/tickets/README.md +39 -0
package/docs/tickets/next-up.md +76 -0
package/docs/workflow-management.md +317 -0
package/docs/workflow-templates.md +423 -0
package/docs/workflow-validation.md +184 -0
package/docs/workflows.md +254 -0
package/package.json +3 -1
package/spec/authoring-spec.json +61 -16
package/workflows/workflow-for-workflows.json +252 -93
package/workflows/workflow-for-workflows.v2.json +188 -77

package/docs/design/console-execution-trace-candidates.md ADDED Viewed

@@ -0,0 +1,211 @@
+# Design Candidates: WorkRail Console Execution-Trace Explainability
+*Source: discovery scoping pass on the engine's event log vs. console DTO gaps*
+*Full landscape packet: `console-explainability-discovery.md`*
+---
+## Problem Understanding
+### Core Tensions
+1. **Completeness vs. UI coherence.** The engine records 23 categories of invisible data across 16 event kinds. Surfacing all of them is completeness. But showing 23 new data items without progressive disclosure creates an overwhelming console. The tension: show everything the engine knows vs. show what helps the specific user understand the specific confusion.
+2. **DTO stability vs. extensibility.** The console DTOs (`ConsoleDagRun`, `ConsoleNodeDetail`) are in use. Adding assessment, capability, and blocker fields requires extension without breaking consumers. The codebase philosophy says 'make illegal states unrepresentable' -- this favors discriminated union shapes over bare nullable fields.
+3. **Projection cost vs. information value.** Some projections are already computed (`executionTraceSummary` is in the DTO). Others require new `console-service.ts` calls per session detail request (assessments, capabilities, preferences). Wiring all projections simultaneously increases response latency.
+### Likely Seam
+The real seam is the `console-service.ts` -> `console-types.ts` boundary -- where projection data is selected and shaped into DTOs. The symptom appears at the UI; the root is the service-to-DTO translation. The DAG topology projection (`projectRunDagV2`) is the structural source of truth; edge cause codes should flow through it, not around it.
+### What Makes This Hard
+Three different kinds of work are required:
+- **Tier 1 (rendering only):** `ConsoleDagRun.executionTraceSummary` is already computed and in the wire format -- the UI panel is simply not implemented.
+- **Tier 2 (service wiring):** `projectAssessmentsV2`, `projectAssessmentConsequencesV2`, `projectCapabilitiesV2` are complete projections, but `console-service.ts` never calls them.
+- **Tier 3 (DTO + projection change):** Blocker detail, gap reason detail, edge cause codes, full context object, preferences changes -- require new DTO fields and projection output changes.
+A junior developer would treat all 23 gaps as equivalent work items and try to fix them simultaneously, not recognizing the tier structure.
+---
+## Philosophy Constraints
+From `CLAUDE.md` and confirmed by codebase observation:
+- **Make illegal states unrepresentable** -- new DTO fields should use discriminated union shapes (e.g. `{ present: true; data: ... } | { present: false }`) rather than bare `null` to distinguish 'not recorded' from 'recorded as empty'.
+- **Architectural fixes over patches** -- extending `ConsoleDagEdge` with cause codes properly is better than a separate edge-explanation projection.
+- **YAGNI with discipline** -- design clear seams; add progressively. Favors Tier 1 first with clean extension points for Tier 2 and Tier 3.
+- **Errors are data** -- all projection calls must return `Result<T, ProjectionError>` with graceful degradation, not exceptions.
+- **Immutability by default** -- all new DTO fields must use `readonly`.
+---
+## Impact Surface
+Beyond the immediate task, changes touch:
+- `console-types.ts` -- DTO shape changes affect any consumer reading the session detail API
+- `console-service.ts` -- new projection calls affect session detail response latency
+- `run-execution-trace.ts` -- extending `CONTEXT_KEYS_TO_ELEVATE` affects which context keys appear in the execution trace
+- `run-dag.ts` -> `ConsoleDagEdge` -- adding cause codes requires changes to the DAG projection DTO boundary
+- Frontend session detail panel -- every new DTO field needs a rendering path
+---
+## Candidates
+### Candidate A: Tier-Organized Scoping (simplest useful output)
+**Summary:** Organize the gap list by implementation effort tier so engineering can estimate scope immediately.
+**Tiers:**
+- Tier 1 (rendering only, 0 backend changes): implement the `executionTraceSummary` panel in the UI -- data is already in `ConsoleDagRun.executionTraceSummary`. Covers: selected_next_step, evaluated_condition, entered_loop, exited_loop, detected_non_tip_advance, divergence items, taskComplexity context fact.
+- Tier 2 (service wiring + DTO extension): call `projectAssessmentsV2`, `projectAssessmentConsequencesV2`, `projectCapabilitiesV2` from `console-service.ts`. Add `assessmentSummary`, `capabilityStatus` to `ConsoleNodeDetail`.
+- Tier 3 (DTO shape change): add `cause` to `ConsoleDagEdge`, add `blockers` to `ConsoleAdvanceOutcome`, add `reason` + `evidenceRefs` to `ConsoleNodeGap`, extend `ConsoleExecutionTraceFact` with more context keys, add `ConsolePreferencesChange` to `ConsoleDagRun`.
+**Tensions resolved:** DTO stability (sequential tiers), projection cost (Tier 1 costs nothing).
+**Tensions accepted:** User story coherence (tiers don't map to user jobs-to-be-done).
+**Boundary solved at:** console-service.ts / console-types.ts. Correct seam.
+**Failure mode:** Teams prioritize Tier 1 (cheapest) but the dominant user confusion (blocked_attempt nodes, assessment gate results) lives in Tier 2-3.
+**Repo pattern:** Follows the `executionTraceSummary` staging precedent exactly.
+**Gains:** Immediately actionable for engineering sprint planning.
+**Losses:** Design team gets a backlog, not a user-facing vision for progressive disclosure.
+**Scope judgment:** Too narrow for a design initiative kickoff; best-fit for an engineering readiness doc.
+**Philosophy fit:** Honors YAGNI, architectural layering. No conflicts.
+---
+### Candidate B: User-Question-Organized (recommended)
+**Summary:** Organize the 23 gaps by the user question they answer, with tier noted per item, so the design team can build the right progressive-disclosure model.
+**Structure:**
+*"Why did the run skip phases / take this path?"*
+- decision trace entries (selected_next_step, evaluated_condition) -- Tier 1, already in executionTraceSummary
+- taskComplexity context fact -- Tier 1, already in executionTraceSummary.contextFacts
+- full run context object (other routing keys) -- Tier 3, requires DTO extension
+- edge cause codes (idempotent_replay, intentional_fork, non_tip_advance) -- Tier 3, requires ConsoleDagEdge.cause field
+*"Why is this node blocked?"*
+- blocker codes (10-value enum: USER_ONLY_DEPENDENCY, MISSING_REQUIRED_OUTPUT, etc.) -- Tier 3, requires ConsoleAdvanceOutcome.blockers
+- blocker pointer (context_key, capability, output_contract, workflow_step) -- Tier 3
+- blocker message + suggestedFix -- Tier 3
+- validation failure linkage (which validation caused the block) -- Tier 2
+*"What did the quality gate decide?"*
+- assessment dimensions (dimensionId, level, rationale) -- Tier 2, requires projectAssessmentsV2 wiring
+- assessment summary -- Tier 2
+- assessment normalization notes -- Tier 2
+- assessment consequence (triggered follow-up, guidance) -- Tier 2, requires projectAssessmentConsequencesV2 wiring
+*"What happened in this loop?"*
+- entered_loop / exited_loop trace entries -- Tier 1, already in executionTraceSummary
+- loop iteration count -- Tier 3, requires engine state loop stack data
+*"Why did run behavior change mid-execution?"*
+- preferences_changed events (autonomy mode, riskPolicy, who changed it) -- Tier 3, not in any projection DTO
+*"Why did the run use a degraded path?"*
+- capability probe results (delegation/web_browsing available/unavailable/unknown) -- Tier 2, requires projectCapabilitiesV2 wiring
+- capability failure codes (tool_missing, tool_error, policy_blocked) -- Tier 2
+*"What does the gap mean?"*
+- gap reason category (user_only_dependency, contract_violation, capability_missing, unexpected) -- Tier 3, requires ConsoleNodeGap.reason field
+- gap evidence refs -- Tier 3
+**Tensions resolved:** UI coherence (maps to user mental models), covers all three stakeholder groups.
+**Tensions accepted:** Engineering must read more carefully to extract tier information per item.
+**Boundary solved at:** Same seam -- the user-question grouping doesn't change where the fix lives.
+**Failure mode:** Design team sees a clean user story per question but underestimates cross-cutting implementation work (e.g., "why blocked?" requires Tier 2 validation linkage AND Tier 3 blocker detail simultaneously).
+**Repo pattern:** Adapts the staging precedent -- same tier model, user-question organization.
+**Gains:** Design team gets a vision and can design progressive disclosure correctly. Engineering can still extract tier information.
+**Losses:** Slightly more reading overhead for pure engineering scope estimation.
+**Scope judgment:** Best-fit for a design initiative kickoff.
+**Philosophy fit:** Honors all principles. The tier notation per item ensures YAGNI and architectural layering are preserved.
+---
+### Candidate C: Minimum Viable Explainability
+**Summary:** Surface only the items that explain the three specific user confusion patterns named in the problem statement, deferring all others.
+The three confusion patterns and their minimum data requirements:
+1. **Fast-path phase skips:** execution trace entries (`selected_next_step`, `evaluated_condition`) + taskComplexity context fact. Status: Tier 1 rendering only -- data is already in DTO.
+2. **blocked_attempt nodes:** blocker codes + messages from `advance_recorded` outcome. Status: Tier 3 -- requires `ConsoleAdvanceOutcome.blockers` field addition.
+3. **Loop structural jumps:** `entered_loop`, `exited_loop` trace entries with loop_id refs. Status: Tier 1 rendering only -- data is already in DTO.
+Result: only one Tier 3 change needed (blocker detail on ConsoleAdvanceOutcome) plus one Tier 1 UI implementation (execution trace panel).
+**Tensions resolved:** YAGNI, fastest path to user-visible improvement for the named pain points.
+**Tensions accepted:** Assessment results, capability degradation, preferences, gap reason detail, edge cause codes all deferred.
+**Boundary solved at:** Same seam; minimal scope.
+**Failure mode:** Assessment and capability gaps are high-value for workflow authors (a primary stakeholder). Deferring them leaves a key user group underserved.
+**Repo pattern:** Most conservative; only extends what is explicitly necessary.
+**Gains:** Fastest path to user-visible improvement; lowest implementation risk.
+**Losses:** Does not address workflow author or platform maintainer needs; defers 20 of 23 gaps.
+**Scope judgment:** Best-fit for a quick-win sprint; too narrow for a full design initiative.
+**Philosophy fit:** Honors YAGNI most strongly. No conflicts.
+---
+## Comparison and Recommendation
+| | Candidate A | Candidate B | Candidate C |
+|---|---|---|---|
+| Completeness vs. UI coherence | Accepts UI gap | Resolves both | Accepts completeness |
+| DTO stability | Explicit tier ordering | Noted per item | Minimal scope |
+| YAGNI | Middle | Middle | Best |
+| User story mapping | Weakest | Best | Partial |
+| Scope fit (design kickoff) | Too narrow | Best-fit | Too narrow |
+| Stakeholder coverage | Engineering only | All three | Operators only |
+| Reversibility | High | High | High |
+| Philosophy fit | Full | Full | Full |
+**Recommendation: Candidate B.**
+Candidate B is the best fit for the stated goal (design initiative scoping) because:
+1. Design teams need user-question framing to build progressive-disclosure models, not backlog lists.
+2. Tier information is preserved per item -- engineering can extract a sprint plan from the same doc.
+3. All three stakeholder groups are covered.
+4. The current `console-explainability-discovery.md` doc already implements this organization.
+---
+## Self-Critique
+**Strongest counter-argument:** Candidate C is faster and directly solves the three named pain points from the problem statement. If the brief's "three confusion patterns" are the complete acceptance criteria (not just illustrative examples), Candidate C is sufficient and Candidate B is over-scoped.
+**Narrower option:** Candidate C lost because it leaves assessment and capability gaps entirely unaddressed. Workflow authors -- a named primary stakeholder -- need assessment visibility to verify their gate logic.
+**Broader option:** Adding implementation patterns (DTO shape specifications, code changes) would be Candidate D. Rejected: out of scope for discovery scoping. The ask is 'what should be visible', not 'how to implement it'.
+**Invalidating assumption:** If user research shows that 90% of user confusion is resolved by the execution trace panel alone (Tier 1 rendering), the design initiative scope collapses to a single UI change and Candidates A and C converge.
+---
+## Open Questions for the Main Agent
+1. Are the "three confusion patterns" from the problem statement the full acceptance criteria, or illustrative examples? (Determines A/B/C choice)
+2. Is there user research on whether users are primarily debugging blocked/confused runs vs. auditing successful ones? (Determines priority order within Candidate B)
+3. Should the design initiative include a progressive-disclosure model proposal, or only the gap list? (Determines whether this doc is the complete deliverable)
+4. Are there performance constraints on the session detail API that would limit Tier 2 projection wiring? (Affects feasibility of wiring all three projections simultaneously)

package/docs/design/console-execution-trace-design-candidates-v2.md ADDED Viewed

@@ -0,0 +1,113 @@
+# Design Candidates: Console Execution-Trace Explainability
+> Temporary workflow artifact for the wr.discovery run. Not canonical state -- all findings live in workflow notes/context.
+## Problem Understanding
+### Core Tensions
+1. **Topology vs causality gap**: The DAG correctly shows *what* ran, but not *why*. A 2-node run for a 10-step workflow is correct behavior (fast path via `runCondition`s) but reads as broken without routing context. The engine records causal explanation events (`decision_trace_appended`) but the console renders only structural events (`node_created`, `edge_created`).
+2. **Completeness vs cognitive overload**: A fully explained run could have 50+ events. The user needs enough context to understand the run, not a raw event log replay. The right design surfaces explanatory data contextually (collapsed by default, as the design locks already suggest for `decision_trace_appended`).
+3. **Domain specificity vs generic rendering**: The console must explain concepts (runCondition, assessmentGate, loopIteration) that are workflow-specific but must be rendered generically by the console layer. The event types are already typed and closed-set -- this is workable.
+### Likely Seam
+The real seam is the rendering layer's distinction between:
+- **(a) Structural events**: what ran (`node_created`, `edge_created`) -- currently shown
+- **(b) Routing events**: why/why-not (`evaluated_condition`, `selected_next_step` in `decision_trace_appended`) -- not surfaced
+- **(c) Quality events**: how well (`assessments` in `stepContext`) -- not surfaced
+- **(d) Health events**: what went wrong or was skipped (`gap_recorded`, `blocked_attempt`, `capability_observed`) -- partially surfaced (blocked_attempt node exists but is undifferentiated)
+### What Makes This Hard
+The user's mental model is "what did the workflow do?" but the event log answers "what transitions occurred?" These are different questions. A `runCondition` evaluation that returns false is invisible in the node/edge graph but is the most important fact for understanding why a phase didn't run.
+---
+## Philosophy Constraints
+- **Make illegal states unrepresentable**: a user seeing a 2-node DAG and concluding "the run broke" is a representable invalid conclusion in the current design. The console should structurally prevent this misread.
+- **Exhaustiveness everywhere**: the question list must be complete, not a representative sample. Missing a real user question is a failure mode.
+- **Explicit domain types over primitives**: questions reference typed concepts (runCondition, assessmentGate, loopIteration) not generic "data."
+- **Surface information, don't hide it**: if something unexpected is discovered, surface it immediately.
+---
+## Impact Surface
+Any console surface that renders run state must stay consistent with:
+- `decision_trace_appended` entries: `selected_next_step`, `evaluated_condition`, `entered_loop`, `exited_loop`, `detected_non_tip_advance`
+- Assessment dimension levels and rationale in `stepContext.assessments`
+- `gap_recorded` severity/reason/resolution model
+- `capability_observed` provenance (strong vs weak enforcement grade)
+- `blocked_attempt` nodeKind distinction from `step`
+- Effective preference snapshot per node (autonomy, riskPolicy)
+---
+## Candidates (Grouping Strategies)
+### Candidate 1: Five-category grouping per the design brief (recommended)
+**Summary**: Use the five categories from the brief (structural/navigation, decision/routing, quality/assessment, iteration/loop, outcome/result).
+**Tensions resolved**: Completeness vs cognitive overload -- categories create natural reading order. Directly answers the brief.
+**Boundary**: Seam is user mental model, not engine internals. Maps naturally to how users investigate a run.
+**Failure mode**: Questions that span categories (e.g., "did the loop run or was it skipped by a runCondition?" touches both routing and iteration). Mitigated by placing cross-cutting questions in the category where the user would first look.
+**Scope**: Best-fit. Exactly what the brief asks for.
+**Philosophy**: Honors exhaustiveness. Clean mapping to explicit domain types.
+---
+### Candidate 2: User-journey temporal order
+**Summary**: Reorder the five categories by when the question arises in a typical console session: structural first, then routing, then iteration, then quality, then outcome.
+**Tensions resolved**: Maps to user's discovery sequence in a console session.
+**Failure mode**: Users debugging a specific problem may jump directly to assessment or loop questions.
+**Scope**: Slightly broad -- adds UX framing the brief doesn't request.
+---
+### Candidate 3: Data-source anchored grouping
+**Summary**: Group by event type: `decision_trace` questions, `gap_recorded` questions, assessment questions, loop trace questions, `runCondition` questions.
+**Tensions resolved**: Makes the data source explicit, most useful for engineers implementing the console.
+**Failure mode**: Users don't think in terms of event types. Hard to use in a design initiative.
+**Scope**: Too narrow for the stated goal.
+---
+## Comparison and Recommendation
+**Recommendation: Candidate 1**
+All three candidates cover the same underlying 30+ questions. Candidate 1 uses the brief's grouping, is most actionable for the console design team, and maps directly to user mental models. Candidate 2 adds temporal ordering the brief doesn't ask for (useful later in a UX design pass). Candidate 3 is for the implementation phase, not the discovery phase.
+---
+## Self-Critique
+**Strongest counter-argument**: Candidate 2's temporal ordering might be more intuitive for a user reading the output. Counter: the brief explicitly specifies the five categories, and temporal order can be derived from the list by the design team.
+**Pivot condition**: If the design team finds the list hard to prioritize, Candidate 2's temporal order becomes relevant. But that's a presentation decision, not a content decision.
+**What assumption would invalidate this**: If the five categories themselves are wrong. They're grounded in the brief's concrete scenarios -- they're not invented.
+---
+## Open Questions for the Main Agent
+1. Are there question categories beyond the five in the brief? (Cross-run comparison, session identity, export/sharing context -- likely lower priority but real.)
+2. Should questions about "what the agent actually did in this step" (step notes/output) be in a sixth category, or does it fit under outcome/result?

package/docs/design/console-execution-trace-design-review.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Design Review Findings: Console Execution-Trace Explainability
+> Temporary workflow artifact for the wr.discovery run. Not canonical state.
+## Tradeoff Review
+**Cross-category question placement:** Questions that span categories (e.g., "did the loop run or was it skipped by a runCondition?") are placed in the category where a user would first look. Verified: Q-L6 covers the loop-vs-runCondition distinction explicitly and is placed in the iteration/loop category, but Q-R1 in decision/routing also addresses the skip-vs-fast-path distinction. No question is orphaned -- the cross-reference is handled by the data-source citations in each question's answer description.
+**Temporal ordering left to the design team:** The five-category order from the brief happens to align with a natural user-discovery sequence (structural → routing → quality → iteration → outcome). No reordering needed. Verified: structural questions come first (what does the DAG show?), routing second (why does it look that way?), which matches how a user investigates.
+**Step notes/output placed in outcome/result:** Q-O5 ("What did each step actually produce?") is correctly in outcome/result. Count of step-output-visibility questions is 1. Not warranted as a separate category.
+---
+## Failure Mode Review
+**Missing scenario coverage (brief's five scenarios):** Verified against all five brief scenarios:
+- 2-node DAG for 10-step workflow: Q-S1, Q-S2, Q-R1, Q-R6 -- covered
+- Assessment gate fired and agent redid a step: Q-Q1, Q-Q2, Q-Q3 -- covered
+- Loop ran 3 iterations: Q-L1, Q-L2, Q-L3, Q-L5 -- covered
+- Workflow used a fast path: Q-R1, Q-R6, Q-S1, Q-R2 -- covered
+- blocked_attempt nodes appearing alongside regular steps: Q-S3, Q-O3 -- covered
+All five scenarios verified. No missing coverage.
+**Workflow version pinning gap:** The question "Was this run on the latest workflow definition, or a pinned older version?" is not explicitly enumerated. This relates to `workflowHash` pinning semantics -- a real user question but secondary to execution-trace explainability. Severity: Yellow (acknowledged gap, not blocking).
+**Loop 0-iteration scenario:** Q-L4 explicitly covers the "validation loop ran 0 times" scenario, grounded in the design locks' requirement that `entered_loop` + `evaluated_condition` + `exited_loop` must all be recorded even for 0-iteration loops. Covered.
+---
+## Runner-Up / Simpler Alternative Review
+**Candidate 2 (temporal order):** Has one element worth noting -- the brief's own category order is already a good temporal sequence. No merge needed. The output presents categories in brief order.
+**Simpler variant (flat unordered list):** Would fail acceptance criterion 2 (must be grouped by user mental model). The grouping is load-bearing for design team usability. Not viable.
+**Merging similar questions (e.g., Q-L2 and Q-L3):** Rejected -- the questions are genuinely distinct (iteration count vs exit reason vs max-iterations-hit). Merging would reduce specificity without reducing length meaningfully.
+---
+## Philosophy Alignment
+- **Exhaustiveness everywhere:** Satisfied. 32 questions covering all five brief scenarios including edge cases (0-iteration loops, fast paths, blocked_attempts, degraded gaps, fork/branch topology).
+- **Explicit domain types over primitives:** Satisfied. Every question references a typed concept (runCondition, assessmentGate, loopIteration, gap_recorded, blocked_attempt) with specific event types cited.
+- **Surface information, don't hide it:** Satisfied. The list is comprehensive by design.
+- **Document why, not what:** Satisfied. Each question includes why that specific data is the right answer, not just a data type.
+- **YAGNI with discipline:** Under acceptable tension -- 32 questions is comprehensive, but the brief explicitly asks for the FULL set. YAGNI doesn't apply to discovery enumerations.
+---
+## Findings
+**Yellow -- Workflow version pinning question not enumerated:**
+The question "Was this run on the latest workflow definition, or on a pinned older version?" is a real user question (relates to `workflowHash` + workflow pinning semantics from the execution contract) but was not included in the five-category enumeration. This is a secondary explainability concern. Recommend adding as Q-S7 in the structural/navigation category.
+No Red or Orange findings. All enumerated questions are grounded in real, existing engine events. The design is sound.
+---
+## Recommended Revisions
+1. Add Q-S7 to the structural/navigation category: "Is this run on the current workflow definition, or was the workflow updated on disk since this run started?" -- answered by the `workflowHash` field in `run_started` event and the execution contract's "workflow changes on disk" divergence warning.
+2. Strengthen Q-R2 to explicitly cover "where was that context variable value set, and was it set correctly?" -- the provenance question, not just the value question.
+Both are additions to the enumeration, not structural changes.
+---
+## Residual Concerns
+One: The question list covers single-run execution-trace explainability comprehensively. Multi-run and cross-run comparison questions (fork history, session overview) are not enumerated -- this is out of scope per the brief but acknowledged.
+The design is ready for the synthesis/output step.