@exaudeus/workrail 3.27.0 → 3.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +3 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +252 -93
  160. package/workflows/workflow-for-workflows.v2.json +188 -77
@@ -0,0 +1,211 @@
1
+ # Design Candidates: WorkRail Console Execution-Trace Explainability
2
+
3
+ *Source: discovery scoping pass on the engine's event log vs. console DTO gaps*
4
+ *Full landscape packet: `console-explainability-discovery.md`*
5
+
6
+ ---
7
+
8
+ ## Problem Understanding
9
+
10
+ ### Core Tensions
11
+
12
+ 1. **Completeness vs. UI coherence.** The engine records 23 categories of invisible data across 16 event kinds. Surfacing all of them is completeness. But showing 23 new data items without progressive disclosure creates an overwhelming console. The tension: show everything the engine knows vs. show what helps the specific user understand the specific confusion.
13
+
14
+ 2. **DTO stability vs. extensibility.** The console DTOs (`ConsoleDagRun`, `ConsoleNodeDetail`) are in use. Adding assessment, capability, and blocker fields requires extension without breaking consumers. The codebase philosophy says 'make illegal states unrepresentable' -- this favors discriminated union shapes over bare nullable fields.
15
+
16
+ 3. **Projection cost vs. information value.** Some projections are already computed (`executionTraceSummary` is in the DTO). Others require new `console-service.ts` calls per session detail request (assessments, capabilities, preferences). Wiring all projections simultaneously increases response latency.
17
+
18
+ ### Likely Seam
19
+
20
+ The real seam is the `console-service.ts` -> `console-types.ts` boundary -- where projection data is selected and shaped into DTOs. The symptom appears at the UI; the root is the service-to-DTO translation. The DAG topology projection (`projectRunDagV2`) is the structural source of truth; edge cause codes should flow through it, not around it.
21
+
22
+ ### What Makes This Hard
23
+
24
+ Three different kinds of work are required:
25
+ - **Tier 1 (rendering only):** `ConsoleDagRun.executionTraceSummary` is already computed and in the wire format -- the UI panel is simply not implemented.
26
+ - **Tier 2 (service wiring):** `projectAssessmentsV2`, `projectAssessmentConsequencesV2`, `projectCapabilitiesV2` are complete projections, but `console-service.ts` never calls them.
27
+ - **Tier 3 (DTO + projection change):** Blocker detail, gap reason detail, edge cause codes, full context object, preferences changes -- require new DTO fields and projection output changes.
28
+
29
+ A junior developer would treat all 23 gaps as equivalent work items and try to fix them simultaneously, not recognizing the tier structure.
30
+
31
+ ---
32
+
33
+ ## Philosophy Constraints
34
+
35
+ From `CLAUDE.md` and confirmed by codebase observation:
36
+
37
+ - **Make illegal states unrepresentable** -- new DTO fields should use discriminated union shapes (e.g. `{ present: true; data: ... } | { present: false }`) rather than bare `null` to distinguish 'not recorded' from 'recorded as empty'.
38
+ - **Architectural fixes over patches** -- extending `ConsoleDagEdge` with cause codes properly is better than a separate edge-explanation projection.
39
+ - **YAGNI with discipline** -- design clear seams; add progressively. Favors Tier 1 first with clean extension points for Tier 2 and Tier 3.
40
+ - **Errors are data** -- all projection calls must return `Result<T, ProjectionError>` with graceful degradation, not exceptions.
41
+ - **Immutability by default** -- all new DTO fields must use `readonly`.
42
+
43
+ ---
44
+
45
+ ## Impact Surface
46
+
47
+ Beyond the immediate task, changes touch:
48
+ - `console-types.ts` -- DTO shape changes affect any consumer reading the session detail API
49
+ - `console-service.ts` -- new projection calls affect session detail response latency
50
+ - `run-execution-trace.ts` -- extending `CONTEXT_KEYS_TO_ELEVATE` affects which context keys appear in the execution trace
51
+ - `run-dag.ts` -> `ConsoleDagEdge` -- adding cause codes requires changes to the DAG projection DTO boundary
52
+ - Frontend session detail panel -- every new DTO field needs a rendering path
53
+
54
+ ---
55
+
56
+ ## Candidates
57
+
58
+ ### Candidate A: Tier-Organized Scoping (simplest useful output)
59
+
60
+ **Summary:** Organize the gap list by implementation effort tier so engineering can estimate scope immediately.
61
+
62
+ **Tiers:**
63
+ - Tier 1 (rendering only, 0 backend changes): implement the `executionTraceSummary` panel in the UI -- data is already in `ConsoleDagRun.executionTraceSummary`. Covers: selected_next_step, evaluated_condition, entered_loop, exited_loop, detected_non_tip_advance, divergence items, taskComplexity context fact.
64
+ - Tier 2 (service wiring + DTO extension): call `projectAssessmentsV2`, `projectAssessmentConsequencesV2`, `projectCapabilitiesV2` from `console-service.ts`. Add `assessmentSummary`, `capabilityStatus` to `ConsoleNodeDetail`.
65
+ - Tier 3 (DTO shape change): add `cause` to `ConsoleDagEdge`, add `blockers` to `ConsoleAdvanceOutcome`, add `reason` + `evidenceRefs` to `ConsoleNodeGap`, extend `ConsoleExecutionTraceFact` with more context keys, add `ConsolePreferencesChange` to `ConsoleDagRun`.
66
+
67
+ **Tensions resolved:** DTO stability (sequential tiers), projection cost (Tier 1 costs nothing).
68
+ **Tensions accepted:** User story coherence (tiers don't map to user jobs-to-be-done).
69
+
70
+ **Boundary solved at:** console-service.ts / console-types.ts. Correct seam.
71
+
72
+ **Failure mode:** Teams prioritize Tier 1 (cheapest) but the dominant user confusion (blocked_attempt nodes, assessment gate results) lives in Tier 2-3.
73
+
74
+ **Repo pattern:** Follows the `executionTraceSummary` staging precedent exactly.
75
+
76
+ **Gains:** Immediately actionable for engineering sprint planning.
77
+ **Losses:** Design team gets a backlog, not a user-facing vision for progressive disclosure.
78
+
79
+ **Scope judgment:** Too narrow for a design initiative kickoff; best-fit for an engineering readiness doc.
80
+
81
+ **Philosophy fit:** Honors YAGNI, architectural layering. No conflicts.
82
+
83
+ ---
84
+
85
+ ### Candidate B: User-Question-Organized (recommended)
86
+
87
+ **Summary:** Organize the 23 gaps by the user question they answer, with tier noted per item, so the design team can build the right progressive-disclosure model.
88
+
89
+ **Structure:**
90
+
91
+ *"Why did the run skip phases / take this path?"*
92
+ - decision trace entries (selected_next_step, evaluated_condition) -- Tier 1, already in executionTraceSummary
93
+ - taskComplexity context fact -- Tier 1, already in executionTraceSummary.contextFacts
94
+ - full run context object (other routing keys) -- Tier 3, requires DTO extension
95
+ - edge cause codes (idempotent_replay, intentional_fork, non_tip_advance) -- Tier 3, requires ConsoleDagEdge.cause field
96
+
97
+ *"Why is this node blocked?"*
98
+ - blocker codes (10-value enum: USER_ONLY_DEPENDENCY, MISSING_REQUIRED_OUTPUT, etc.) -- Tier 3, requires ConsoleAdvanceOutcome.blockers
99
+ - blocker pointer (context_key, capability, output_contract, workflow_step) -- Tier 3
100
+ - blocker message + suggestedFix -- Tier 3
101
+ - validation failure linkage (which validation caused the block) -- Tier 2
102
+
103
+ *"What did the quality gate decide?"*
104
+ - assessment dimensions (dimensionId, level, rationale) -- Tier 2, requires projectAssessmentsV2 wiring
105
+ - assessment summary -- Tier 2
106
+ - assessment normalization notes -- Tier 2
107
+ - assessment consequence (triggered follow-up, guidance) -- Tier 2, requires projectAssessmentConsequencesV2 wiring
108
+
109
+ *"What happened in this loop?"*
110
+ - entered_loop / exited_loop trace entries -- Tier 1, already in executionTraceSummary
111
+ - loop iteration count -- Tier 3, requires engine state loop stack data
112
+
113
+ *"Why did run behavior change mid-execution?"*
114
+ - preferences_changed events (autonomy mode, riskPolicy, who changed it) -- Tier 3, not in any projection DTO
115
+
116
+ *"Why did the run use a degraded path?"*
117
+ - capability probe results (delegation/web_browsing available/unavailable/unknown) -- Tier 2, requires projectCapabilitiesV2 wiring
118
+ - capability failure codes (tool_missing, tool_error, policy_blocked) -- Tier 2
119
+
120
+ *"What does the gap mean?"*
121
+ - gap reason category (user_only_dependency, contract_violation, capability_missing, unexpected) -- Tier 3, requires ConsoleNodeGap.reason field
122
+ - gap evidence refs -- Tier 3
123
+
124
+ **Tensions resolved:** UI coherence (maps to user mental models), covers all three stakeholder groups.
125
+ **Tensions accepted:** Engineering must read more carefully to extract tier information per item.
126
+
127
+ **Boundary solved at:** Same seam -- the user-question grouping doesn't change where the fix lives.
128
+
129
+ **Failure mode:** Design team sees a clean user story per question but underestimates cross-cutting implementation work (e.g., "why blocked?" requires Tier 2 validation linkage AND Tier 3 blocker detail simultaneously).
130
+
131
+ **Repo pattern:** Adapts the staging precedent -- same tier model, user-question organization.
132
+
133
+ **Gains:** Design team gets a vision and can design progressive disclosure correctly. Engineering can still extract tier information.
134
+ **Losses:** Slightly more reading overhead for pure engineering scope estimation.
135
+
136
+ **Scope judgment:** Best-fit for a design initiative kickoff.
137
+
138
+ **Philosophy fit:** Honors all principles. The tier notation per item ensures YAGNI and architectural layering are preserved.
139
+
140
+ ---
141
+
142
+ ### Candidate C: Minimum Viable Explainability
143
+
144
+ **Summary:** Surface only the items that explain the three specific user confusion patterns named in the problem statement, deferring all others.
145
+
146
+ The three confusion patterns and their minimum data requirements:
147
+ 1. **Fast-path phase skips:** execution trace entries (`selected_next_step`, `evaluated_condition`) + taskComplexity context fact. Status: Tier 1 rendering only -- data is already in DTO.
148
+ 2. **blocked_attempt nodes:** blocker codes + messages from `advance_recorded` outcome. Status: Tier 3 -- requires `ConsoleAdvanceOutcome.blockers` field addition.
149
+ 3. **Loop structural jumps:** `entered_loop`, `exited_loop` trace entries with loop_id refs. Status: Tier 1 rendering only -- data is already in DTO.
150
+
151
+ Result: only one Tier 3 change needed (blocker detail on ConsoleAdvanceOutcome) plus one Tier 1 UI implementation (execution trace panel).
152
+
153
+ **Tensions resolved:** YAGNI, fastest path to user-visible improvement for the named pain points.
154
+ **Tensions accepted:** Assessment results, capability degradation, preferences, gap reason detail, edge cause codes all deferred.
155
+
156
+ **Boundary solved at:** Same seam; minimal scope.
157
+
158
+ **Failure mode:** Assessment and capability gaps are high-value for workflow authors (a primary stakeholder). Deferring them leaves a key user group underserved.
159
+
160
+ **Repo pattern:** Most conservative; only extends what is explicitly necessary.
161
+
162
+ **Gains:** Fastest path to user-visible improvement; lowest implementation risk.
163
+ **Losses:** Does not address workflow author or platform maintainer needs; defers 20 of 23 gaps.
164
+
165
+ **Scope judgment:** Best-fit for a quick-win sprint; too narrow for a full design initiative.
166
+
167
+ **Philosophy fit:** Honors YAGNI most strongly. No conflicts.
168
+
169
+ ---
170
+
171
+ ## Comparison and Recommendation
172
+
173
+ | | Candidate A | Candidate B | Candidate C |
174
+ |---|---|---|---|
175
+ | Completeness vs. UI coherence | Accepts UI gap | Resolves both | Accepts completeness |
176
+ | DTO stability | Explicit tier ordering | Noted per item | Minimal scope |
177
+ | YAGNI | Middle | Middle | Best |
178
+ | User story mapping | Weakest | Best | Partial |
179
+ | Scope fit (design kickoff) | Too narrow | Best-fit | Too narrow |
180
+ | Stakeholder coverage | Engineering only | All three | Operators only |
181
+ | Reversibility | High | High | High |
182
+ | Philosophy fit | Full | Full | Full |
183
+
184
+ **Recommendation: Candidate B.**
185
+
186
+ Candidate B is the best fit for the stated goal (design initiative scoping) because:
187
+ 1. Design teams need user-question framing to build progressive-disclosure models, not backlog lists.
188
+ 2. Tier information is preserved per item -- engineering can extract a sprint plan from the same doc.
189
+ 3. All three stakeholder groups are covered.
190
+ 4. The current `console-explainability-discovery.md` doc already implements this organization.
191
+
192
+ ---
193
+
194
+ ## Self-Critique
195
+
196
+ **Strongest counter-argument:** Candidate C is faster and directly solves the three named pain points from the problem statement. If the brief's "three confusion patterns" are the complete acceptance criteria (not just illustrative examples), Candidate C is sufficient and Candidate B is over-scoped.
197
+
198
+ **Narrower option:** Candidate C lost because it leaves assessment and capability gaps entirely unaddressed. Workflow authors -- a named primary stakeholder -- need assessment visibility to verify their gate logic.
199
+
200
+ **Broader option:** Adding implementation patterns (DTO shape specifications, code changes) would be Candidate D. Rejected: out of scope for discovery scoping. The ask is 'what should be visible', not 'how to implement it'.
201
+
202
+ **Invalidating assumption:** If user research shows that 90% of user confusion is resolved by the execution trace panel alone (Tier 1 rendering), the design initiative scope collapses to a single UI change and Candidates A and C converge.
203
+
204
+ ---
205
+
206
+ ## Open Questions for the Main Agent
207
+
208
+ 1. Are the "three confusion patterns" from the problem statement the full acceptance criteria, or illustrative examples? (Determines A/B/C choice)
209
+ 2. Is there user research on whether users are primarily debugging blocked/confused runs vs. auditing successful ones? (Determines priority order within Candidate B)
210
+ 3. Should the design initiative include a progressive-disclosure model proposal, or only the gap list? (Determines whether this doc is the complete deliverable)
211
+ 4. Are there performance constraints on the session detail API that would limit Tier 2 projection wiring? (Affects feasibility of wiring all three projections simultaneously)
@@ -0,0 +1,113 @@
1
+ # Design Candidates: Console Execution-Trace Explainability
2
+
3
+ > Temporary workflow artifact for the wr.discovery run. Not canonical state -- all findings live in workflow notes/context.
4
+
5
+ ## Problem Understanding
6
+
7
+ ### Core Tensions
8
+
9
+ 1. **Topology vs causality gap**: The DAG correctly shows *what* ran, but not *why*. A 2-node run for a 10-step workflow is correct behavior (fast path via `runCondition`s) but reads as broken without routing context. The engine records causal explanation events (`decision_trace_appended`) but the console renders only structural events (`node_created`, `edge_created`).
10
+
11
+ 2. **Completeness vs cognitive overload**: A fully explained run could have 50+ events. The user needs enough context to understand the run, not a raw event log replay. The right design surfaces explanatory data contextually (collapsed by default, as the design locks already suggest for `decision_trace_appended`).
12
+
13
+ 3. **Domain specificity vs generic rendering**: The console must explain concepts (runCondition, assessmentGate, loopIteration) that are workflow-specific but must be rendered generically by the console layer. The event types are already typed and closed-set -- this is workable.
14
+
15
+ ### Likely Seam
16
+
17
+ The real seam is the rendering layer's distinction between:
18
+ - **(a) Structural events**: what ran (`node_created`, `edge_created`) -- currently shown
19
+ - **(b) Routing events**: why/why-not (`evaluated_condition`, `selected_next_step` in `decision_trace_appended`) -- not surfaced
20
+ - **(c) Quality events**: how well (`assessments` in `stepContext`) -- not surfaced
21
+ - **(d) Health events**: what went wrong or was skipped (`gap_recorded`, `blocked_attempt`, `capability_observed`) -- partially surfaced (blocked_attempt node exists but is undifferentiated)
22
+
23
+ ### What Makes This Hard
24
+
25
+ The user's mental model is "what did the workflow do?" but the event log answers "what transitions occurred?" These are different questions. A `runCondition` evaluation that returns false is invisible in the node/edge graph but is the most important fact for understanding why a phase didn't run.
26
+
27
+ ---
28
+
29
+ ## Philosophy Constraints
30
+
31
+ - **Make illegal states unrepresentable**: a user seeing a 2-node DAG and concluding "the run broke" is a representable invalid conclusion in the current design. The console should structurally prevent this misread.
32
+ - **Exhaustiveness everywhere**: the question list must be complete, not a representative sample. Missing a real user question is a failure mode.
33
+ - **Explicit domain types over primitives**: questions reference typed concepts (runCondition, assessmentGate, loopIteration) not generic "data."
34
+ - **Surface information, don't hide it**: if something unexpected is discovered, surface it immediately.
35
+
36
+ ---
37
+
38
+ ## Impact Surface
39
+
40
+ Any console surface that renders run state must stay consistent with:
41
+ - `decision_trace_appended` entries: `selected_next_step`, `evaluated_condition`, `entered_loop`, `exited_loop`, `detected_non_tip_advance`
42
+ - Assessment dimension levels and rationale in `stepContext.assessments`
43
+ - `gap_recorded` severity/reason/resolution model
44
+ - `capability_observed` provenance (strong vs weak enforcement grade)
45
+ - `blocked_attempt` nodeKind distinction from `step`
46
+ - Effective preference snapshot per node (autonomy, riskPolicy)
47
+
48
+ ---
49
+
50
+ ## Candidates (Grouping Strategies)
51
+
52
+ ### Candidate 1: Five-category grouping per the design brief (recommended)
53
+
54
+ **Summary**: Use the five categories from the brief (structural/navigation, decision/routing, quality/assessment, iteration/loop, outcome/result).
55
+
56
+ **Tensions resolved**: Completeness vs cognitive overload -- categories create natural reading order. Directly answers the brief.
57
+
58
+ **Boundary**: Seam is user mental model, not engine internals. Maps naturally to how users investigate a run.
59
+
60
+ **Failure mode**: Questions that span categories (e.g., "did the loop run or was it skipped by a runCondition?" touches both routing and iteration). Mitigated by placing cross-cutting questions in the category where the user would first look.
61
+
62
+ **Scope**: Best-fit. Exactly what the brief asks for.
63
+
64
+ **Philosophy**: Honors exhaustiveness. Clean mapping to explicit domain types.
65
+
66
+ ---
67
+
68
+ ### Candidate 2: User-journey temporal order
69
+
70
+ **Summary**: Reorder the five categories by when the question arises in a typical console session: structural first, then routing, then iteration, then quality, then outcome.
71
+
72
+ **Tensions resolved**: Maps to user's discovery sequence in a console session.
73
+
74
+ **Failure mode**: Users debugging a specific problem may jump directly to assessment or loop questions.
75
+
76
+ **Scope**: Slightly broad -- adds UX framing the brief doesn't request.
77
+
78
+ ---
79
+
80
+ ### Candidate 3: Data-source anchored grouping
81
+
82
+ **Summary**: Group by event type: `decision_trace` questions, `gap_recorded` questions, assessment questions, loop trace questions, `runCondition` questions.
83
+
84
+ **Tensions resolved**: Makes the data source explicit, most useful for engineers implementing the console.
85
+
86
+ **Failure mode**: Users don't think in terms of event types. Hard to use in a design initiative.
87
+
88
+ **Scope**: Too narrow for the stated goal.
89
+
90
+ ---
91
+
92
+ ## Comparison and Recommendation
93
+
94
+ **Recommendation: Candidate 1**
95
+
96
+ All three candidates cover the same underlying 30+ questions. Candidate 1 uses the brief's grouping, is most actionable for the console design team, and maps directly to user mental models. Candidate 2 adds temporal ordering the brief doesn't ask for (useful later in a UX design pass). Candidate 3 is for the implementation phase, not the discovery phase.
97
+
98
+ ---
99
+
100
+ ## Self-Critique
101
+
102
+ **Strongest counter-argument**: Candidate 2's temporal ordering might be more intuitive for a user reading the output. Counter: the brief explicitly specifies the five categories, and temporal order can be derived from the list by the design team.
103
+
104
+ **Pivot condition**: If the design team finds the list hard to prioritize, Candidate 2's temporal order becomes relevant. But that's a presentation decision, not a content decision.
105
+
106
+ **What assumption would invalidate this**: If the five categories themselves are wrong. They're grounded in the brief's concrete scenarios -- they're not invented.
107
+
108
+ ---
109
+
110
+ ## Open Questions for the Main Agent
111
+
112
+ 1. Are there question categories beyond the five in the brief? (Cross-run comparison, session identity, export/sharing context -- likely lower priority but real.)
113
+ 2. Should questions about "what the agent actually did in this step" (step notes/output) be in a sixth category, or does it fit under outcome/result?
@@ -0,0 +1,74 @@
1
+ # Design Review Findings: Console Execution-Trace Explainability
2
+
3
+ > Temporary workflow artifact for the wr.discovery run. Not canonical state.
4
+
5
+ ## Tradeoff Review
6
+
7
+ **Cross-category question placement:** Questions that span categories (e.g., "did the loop run or was it skipped by a runCondition?") are placed in the category where a user would first look. Verified: Q-L6 covers the loop-vs-runCondition distinction explicitly and is placed in the iteration/loop category, but Q-R1 in decision/routing also addresses the skip-vs-fast-path distinction. No question is orphaned -- the cross-reference is handled by the data-source citations in each question's answer description.
8
+
9
+ **Temporal ordering left to the design team:** The five-category order from the brief happens to align with a natural user-discovery sequence (structural → routing → quality → iteration → outcome). No reordering needed. Verified: structural questions come first (what does the DAG show?), routing second (why does it look that way?), which matches how a user investigates.
10
+
11
+ **Step notes/output placed in outcome/result:** Q-O5 ("What did each step actually produce?") is correctly in outcome/result. Count of step-output-visibility questions is 1. Not warranted as a separate category.
12
+
13
+ ---
14
+
15
+ ## Failure Mode Review
16
+
17
+ **Missing scenario coverage (brief's five scenarios):** Verified against all five brief scenarios:
18
+ - 2-node DAG for 10-step workflow: Q-S1, Q-S2, Q-R1, Q-R6 -- covered
19
+ - Assessment gate fired and agent redid a step: Q-Q1, Q-Q2, Q-Q3 -- covered
20
+ - Loop ran 3 iterations: Q-L1, Q-L2, Q-L3, Q-L5 -- covered
21
+ - Workflow used a fast path: Q-R1, Q-R6, Q-S1, Q-R2 -- covered
22
+ - blocked_attempt nodes appearing alongside regular steps: Q-S3, Q-O3 -- covered
23
+
24
+ All five scenarios verified. No missing coverage.
25
+
26
+ **Workflow version pinning gap:** The question "Was this run on the latest workflow definition, or a pinned older version?" is not explicitly enumerated. This relates to `workflowHash` pinning semantics -- a real user question but secondary to execution-trace explainability. Severity: Yellow (acknowledged gap, not blocking).
27
+
28
+ **Loop 0-iteration scenario:** Q-L4 explicitly covers the "validation loop ran 0 times" scenario, grounded in the design locks' requirement that `entered_loop` + `evaluated_condition` + `exited_loop` must all be recorded even for 0-iteration loops. Covered.
29
+
30
+ ---
31
+
32
+ ## Runner-Up / Simpler Alternative Review
33
+
34
+ **Candidate 2 (temporal order):** Has one element worth noting -- the brief's own category order is already a good temporal sequence. No merge needed. The output presents categories in brief order.
35
+
36
+ **Simpler variant (flat unordered list):** Would fail acceptance criterion 2 (must be grouped by user mental model). The grouping is load-bearing for design team usability. Not viable.
37
+
38
+ **Merging similar questions (e.g., Q-L2 and Q-L3):** Rejected -- the questions are genuinely distinct (iteration count vs exit reason vs max-iterations-hit). Merging would reduce specificity without reducing length meaningfully.
39
+
40
+ ---
41
+
42
+ ## Philosophy Alignment
43
+
44
+ - **Exhaustiveness everywhere:** Satisfied. 32 questions covering all five brief scenarios including edge cases (0-iteration loops, fast paths, blocked_attempts, degraded gaps, fork/branch topology).
45
+ - **Explicit domain types over primitives:** Satisfied. Every question references a typed concept (runCondition, assessmentGate, loopIteration, gap_recorded, blocked_attempt) with specific event types cited.
46
+ - **Surface information, don't hide it:** Satisfied. The list is comprehensive by design.
47
+ - **Document why, not what:** Satisfied. Each question includes why that specific data is the right answer, not just a data type.
48
+ - **YAGNI with discipline:** Under acceptable tension -- 32 questions is comprehensive, but the brief explicitly asks for the FULL set. YAGNI doesn't apply to discovery enumerations.
49
+
50
+ ---
51
+
52
+ ## Findings
53
+
54
+ **Yellow -- Workflow version pinning question not enumerated:**
55
+ The question "Was this run on the latest workflow definition, or on a pinned older version?" is a real user question (relates to `workflowHash` + workflow pinning semantics from the execution contract) but was not included in the five-category enumeration. This is a secondary explainability concern. Recommend adding as Q-S7 in the structural/navigation category.
56
+
57
+ No Red or Orange findings. All enumerated questions are grounded in real, existing engine events. The design is sound.
58
+
59
+ ---
60
+
61
+ ## Recommended Revisions
62
+
63
+ 1. Add Q-S7 to the structural/navigation category: "Is this run on the current workflow definition, or was the workflow updated on disk since this run started?" -- answered by the `workflowHash` field in `run_started` event and the execution contract's "workflow changes on disk" divergence warning.
64
+ 2. Strengthen Q-R2 to explicitly cover "where was that context variable value set, and was it set correctly?" -- the provenance question, not just the value question.
65
+
66
+ Both are additions to the enumeration, not structural changes.
67
+
68
+ ---
69
+
70
+ ## Residual Concerns
71
+
72
+ One: The question list covers single-run execution-trace explainability comprehensively. Multi-run and cross-run comparison questions (fork history, session overview) are not enumerated -- this is out of scope per the brief but acknowledged.
73
+
74
+ The design is ready for the synthesis/output step.