@exaudeus/workrail 3.28.0 → 3.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/dist/console/assets/{index-C146q2kN.js → index-Bl5-Ghuu.js} +1 -1
  2. package/dist/console/index.html +1 -1
  3. package/dist/manifest.json +3 -3
  4. package/docs/README.md +57 -0
  5. package/docs/adrs/001-hybrid-storage-backend.md +38 -0
  6. package/docs/adrs/002-four-layer-context-classification.md +38 -0
  7. package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
  8. package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
  9. package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
  10. package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
  11. package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
  12. package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
  13. package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
  14. package/docs/adrs/010-release-pipeline.md +89 -0
  15. package/docs/architecture/README.md +7 -0
  16. package/docs/architecture/refactor-audit.md +364 -0
  17. package/docs/authoring-v2.md +527 -0
  18. package/docs/authoring.md +873 -0
  19. package/docs/changelog-recent.md +201 -0
  20. package/docs/configuration.md +505 -0
  21. package/docs/ctc-mcp-proposal.md +518 -0
  22. package/docs/design/README.md +22 -0
  23. package/docs/design/agent-cascade-protocol.md +96 -0
  24. package/docs/design/autonomous-console-design-candidates.md +253 -0
  25. package/docs/design/autonomous-console-design-review.md +111 -0
  26. package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
  27. package/docs/design/claude-code-source-deep-dive.md +713 -0
  28. package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
  29. package/docs/design/console-execution-trace-candidates-final.md +160 -0
  30. package/docs/design/console-execution-trace-candidates.md +211 -0
  31. package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
  32. package/docs/design/console-execution-trace-design-review.md +74 -0
  33. package/docs/design/console-execution-trace-discovery.md +394 -0
  34. package/docs/design/console-execution-trace-final-review.md +77 -0
  35. package/docs/design/console-execution-trace-review.md +92 -0
  36. package/docs/design/console-performance-discovery.md +415 -0
  37. package/docs/design/console-ui-backlog.md +280 -0
  38. package/docs/design/daemon-architecture-discovery.md +853 -0
  39. package/docs/design/daemon-design-candidates.md +318 -0
  40. package/docs/design/daemon-design-review-findings.md +119 -0
  41. package/docs/design/daemon-engine-design-candidates.md +210 -0
  42. package/docs/design/daemon-engine-design-review.md +131 -0
  43. package/docs/design/daemon-execution-engine-discovery.md +280 -0
  44. package/docs/design/daemon-gap-analysis.md +554 -0
  45. package/docs/design/daemon-owns-console-plan.md +168 -0
  46. package/docs/design/daemon-owns-console-review.md +91 -0
  47. package/docs/design/daemon-owns-console.md +195 -0
  48. package/docs/design/data-model-erd.md +11 -0
  49. package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
  50. package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
  51. package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
  52. package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
  53. package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
  54. package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
  55. package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
  56. package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
  57. package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
  58. package/docs/design/list-workflows-latency-fix-plan.md +128 -0
  59. package/docs/design/list-workflows-latency-fix-review.md +55 -0
  60. package/docs/design/list-workflows-latency-fix.md +109 -0
  61. package/docs/design/native-context-management-api.md +11 -0
  62. package/docs/design/performance-sweep-2026-04.md +96 -0
  63. package/docs/design/routines-guide.md +219 -0
  64. package/docs/design/sequence-diagrams.md +11 -0
  65. package/docs/design/subagent-design-principles.md +220 -0
  66. package/docs/design/temporal-patterns-design-candidates.md +312 -0
  67. package/docs/design/temporal-patterns-design-review-findings.md +163 -0
  68. package/docs/design/test-isolation-from-config-file.md +335 -0
  69. package/docs/design/v2-core-design-locks.md +2746 -0
  70. package/docs/design/v2-lock-registry.json +734 -0
  71. package/docs/design/workflow-authoring-v2.md +1044 -0
  72. package/docs/design/workflow-docs-spec.md +218 -0
  73. package/docs/design/workflow-extension-points.md +687 -0
  74. package/docs/design/workrail-auto-trigger-system.md +359 -0
  75. package/docs/design/workrail-config-file-discovery.md +513 -0
  76. package/docs/docker.md +110 -0
  77. package/docs/generated/v2-lock-closure-plan.md +26 -0
  78. package/docs/generated/v2-lock-coverage.json +797 -0
  79. package/docs/generated/v2-lock-coverage.md +177 -0
  80. package/docs/ideas/backlog.md +3927 -0
  81. package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
  82. package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
  83. package/docs/ideas/implementation_plan.md +249 -0
  84. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
  85. package/docs/implementation/02-architecture.md +316 -0
  86. package/docs/implementation/04-testing-strategy.md +124 -0
  87. package/docs/implementation/09-simple-workflow-guide.md +835 -0
  88. package/docs/implementation/13-advanced-validation-guide.md +874 -0
  89. package/docs/implementation/README.md +21 -0
  90. package/docs/integrations/claude-code.md +300 -0
  91. package/docs/integrations/firebender.md +315 -0
  92. package/docs/migration/v0.1.0.md +147 -0
  93. package/docs/naming-conventions.md +45 -0
  94. package/docs/planning/README.md +104 -0
  95. package/docs/planning/github-ticketing-playbook.md +195 -0
  96. package/docs/plans/README.md +24 -0
  97. package/docs/plans/agent-managed-ticketing-design.md +605 -0
  98. package/docs/plans/agentic-orchestration-roadmap.md +112 -0
  99. package/docs/plans/assessment-gates-engine-handoff.md +536 -0
  100. package/docs/plans/content-coherence-and-references.md +151 -0
  101. package/docs/plans/library-extraction-plan.md +340 -0
  102. package/docs/plans/mr-review-workflow-redesign.md +1451 -0
  103. package/docs/plans/native-context-management-epic.md +11 -0
  104. package/docs/plans/perf-fixes-design-candidates.md +225 -0
  105. package/docs/plans/perf-fixes-design-review-findings.md +61 -0
  106. package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
  107. package/docs/plans/perf-fixes-new-issues-review.md +110 -0
  108. package/docs/plans/prompt-fragments.md +53 -0
  109. package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
  110. package/docs/plans/ui-ux-workflow-discovery.md +100 -0
  111. package/docs/plans/ui-ux-workflow-review.md +48 -0
  112. package/docs/plans/v2-followup-enhancements.md +587 -0
  113. package/docs/plans/workflow-categories-candidates.md +105 -0
  114. package/docs/plans/workflow-categories-discovery.md +110 -0
  115. package/docs/plans/workflow-categories-review.md +51 -0
  116. package/docs/plans/workflow-discovery-model-candidates.md +94 -0
  117. package/docs/plans/workflow-discovery-model-discovery.md +74 -0
  118. package/docs/plans/workflow-discovery-model-review.md +48 -0
  119. package/docs/plans/workflow-source-setup-phase-1.md +245 -0
  120. package/docs/plans/workflow-source-setup-phase-2.md +361 -0
  121. package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
  122. package/docs/plans/workflow-staleness-detection-review.md +58 -0
  123. package/docs/plans/workflow-staleness-detection.md +80 -0
  124. package/docs/plans/workflow-v2-design.md +69 -0
  125. package/docs/plans/workflow-v2-roadmap.md +74 -0
  126. package/docs/plans/workflow-validation-design.md +98 -0
  127. package/docs/plans/workflow-validation-roadmap.md +108 -0
  128. package/docs/plans/workrail-platform-vision.md +420 -0
  129. package/docs/reference/agent-context-cleaner-snippet.md +94 -0
  130. package/docs/reference/agent-context-guidance.md +140 -0
  131. package/docs/reference/context-optimization.md +284 -0
  132. package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
  133. package/docs/reference/example-workflow-repository-template/README.md +268 -0
  134. package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
  135. package/docs/reference/external-workflow-repositories.md +916 -0
  136. package/docs/reference/feature-flags-architecture.md +472 -0
  137. package/docs/reference/feature-flags.md +349 -0
  138. package/docs/reference/god-tier-workflow-validation.md +272 -0
  139. package/docs/reference/loop-optimization.md +209 -0
  140. package/docs/reference/loop-validation.md +176 -0
  141. package/docs/reference/loops.md +465 -0
  142. package/docs/reference/mcp-platform-constraints.md +59 -0
  143. package/docs/reference/recovery.md +88 -0
  144. package/docs/reference/releases.md +177 -0
  145. package/docs/reference/troubleshooting.md +105 -0
  146. package/docs/reference/workflow-execution-contract.md +998 -0
  147. package/docs/roadmap/README.md +22 -0
  148. package/docs/roadmap/legacy-planning-status.md +103 -0
  149. package/docs/roadmap/now-next-later.md +70 -0
  150. package/docs/roadmap/open-work-inventory.md +389 -0
  151. package/docs/tickets/README.md +39 -0
  152. package/docs/tickets/next-up.md +76 -0
  153. package/docs/workflow-management.md +317 -0
  154. package/docs/workflow-templates.md +423 -0
  155. package/docs/workflow-validation.md +184 -0
  156. package/docs/workflows.md +254 -0
  157. package/package.json +4 -1
  158. package/spec/authoring-spec.json +61 -16
  159. package/workflows/workflow-for-workflows.json +3 -3
  160. package/workflows/workflow-for-workflows.v2.json +3 -3
@@ -0,0 +1,394 @@
1
+ # WorkRail Console Execution-Trace Explainability -- Discovery Doc
2
+
3
+ **Status:** Complete
4
+ **Goal:** Identify everything the engine records that the console does not yet surface
5
+
6
+ ## Final Summary
7
+
8
+ **Path used:** landscape_first -- gap analysis between what the engine records and what the console surfaces. Direct code reading of event schemas, projections, console-service.ts, and console DTOs.
9
+
10
+ **Problem framing:** The engine has a complete reasoning audit trail across 16 event kinds. The console renders 6-7. The gap is not a data problem -- it is a surfacing problem. All the data exists; three tiers of work are needed to expose it.
11
+
12
+ **Chosen direction:** Candidate B -- User-Question-Organized gap list with Priority Zero hybrid. Organizes 23 gaps by the user question they answer, with implementation tier noted per item. Priority Zero callout identifies the fast-win starting point.
13
+
14
+ **Why it won:** The brief explicitly requests 'the FULL set of things that should be visible.' Candidate B is the only response that satisfies this. Candidate C (minimum viable) covers only 3 of 23 items. Tier information is preserved per item so engineering can extract a sprint plan.
15
+
16
+ **Strongest alternative:** Candidate C (Minimum Viable Explainability). The right choice if the initiative is reframed as a quick-win sprint. Two of three named confusion patterns are Tier 1 rendering only (already computed, no backend change).
17
+
18
+ **Confidence band:** Medium-high. Gap list: high confidence (all 23 items traceable to specific code). Priority ordering: medium (user model unvalidated).
19
+
20
+ **Residual risks:**
21
+ 1. User model unvalidated -- priority order within the gap list is assumption-driven. Run a 30-minute user research pulse with 3-5 users on real sessions before committing to full initiative scope.
22
+ 2. Session detail API latency with 3 new projection calls not benchmarked.
23
+ 3. CONTEXT_KEYS_TO_ELEVATE extension requires workflow-wide knowledge of routing-critical keys.
24
+
25
+ **Next actions for the design team:**
26
+ 1. Review the Priority Zero items and ship the execution trace panel render (Tier 1, no backend change).
27
+ 2. Run a user research pulse to validate the user model before proceeding.
28
+ 3. If research confirms users cannot explain confusion after Priority Zero, proceed with the full 23-item design initiative.
29
+ 4. If research confirms Priority Zero is sufficient, scope down to Candidate C.
30
+
31
+ **Artifacts:**
32
+ - `console-explainability-discovery.md` -- this document (primary deliverable)
33
+ - `console-explainability-design-candidates.md` -- three candidates with full reasoning
34
+ - `console-explainability-review-findings.md` -- tradeoffs, failure modes, findings
35
+
36
+ ---
37
+
38
+ ## Context / Ask
39
+
40
+ The WorkRail Console currently renders only `node_created` and `edge_created` events as a DAG.
41
+ Users see nodes and edges but have no visibility into:
42
+ - why the run jumped from phase 0 to phase 5 (fast-path conditions)
43
+ - what drove routing decisions (context variables like `taskComplexity`)
44
+ - why `blocked_attempt` nodes exist alongside regular step nodes
45
+ - what assessment gates concluded and whether they triggered follow-ups
46
+ - what loop iterations looked like and why a loop exited
47
+ - what divergences the engine deliberately recorded
48
+
49
+ The question: what is the full set of things users should be able to see?
50
+ This is a scoping document -- not an implementation plan.
51
+
52
+ ---
53
+
54
+ ## Path Recommendation
55
+
56
+ **landscape_first** -- the dominant need is understanding what data already exists in the
57
+ event log and projections vs. what the console currently exposes. There is no reframing
58
+ risk here; the problem statement is precise and grounded. A full-spectrum path would add
59
+ unnecessary overhead.
60
+
61
+ ---
62
+
63
+ ## Constraints / Anti-goals
64
+
65
+ - **In scope:** what data exists in the engine, what is hidden, what users would understand from seeing it
66
+ - **Out of scope:** how to implement UI panels, API changes, or performance implications
67
+ - **Anti-goal:** do not produce implementation specs or DTO changes in this discovery phase
68
+ - **Anti-goal:** do not invent data that isn't already being recorded
69
+
70
+ ---
71
+
72
+ ## Landscape Packet
73
+
74
+ ### Event Types in the Engine (16 total in `EVENT_KIND`)
75
+
76
+ The engine records 16 distinct event kinds. The console renders only 2.
77
+
78
+ | Event Kind | Currently Surfaced? | What It Contains |
79
+ |---|---|---|
80
+ | `session_created` | No (implicit) | Session birth marker |
81
+ | `observation_recorded` | Partially (git_branch extracted for display) | git_branch, git_head_sha, repo_root_hash, repo_root -- with confidence levels |
82
+ | `run_started` | Partially (workflowId/hash shown) | workflowId, workflowHash, workflowSourceKind, workflowSourceRef |
83
+ | `node_created` | YES | nodeKind, parentNodeId, workflowHash, snapshotRef |
84
+ | `edge_created` | YES | edgeKind, fromNodeId, toNodeId, cause (kind + eventId) |
85
+ | `advance_recorded` | Partially (outcome.kind on node detail) | attemptId, intent, outcome (blocked/advanced with toNodeId) |
86
+ | `validation_performed` | YES (node detail only) | validationId, attemptId, contractRef, result (valid, issues, suggestions) |
87
+ | `node_output_appended` | YES (recap channel, artifact channel) | channel, payload (notes markdown or artifact ref) |
88
+ | `assessment_recorded` | NO -- projection exists, not wired to console | assessmentId, attemptId, artifactOutputId, summary, normalizationNotes, dimensions (dimensionId, level, rationale, normalization) |
89
+ | `assessment_consequence_applied` | NO -- projection exists, not wired to console | assessmentId, trigger (dimensionId, level), effect (kind: require_followup, guidance) |
90
+ | `preferences_changed` | NO | changeId, source (user/workflow_recommendation/system), delta, effective (autonomy, riskPolicy) |
91
+ | `capability_observed` | NO -- projection exists, not wired to console | capability (delegation/web_browsing), status (unknown/available/unavailable), provenance (probe_step/attempted_use/manual_claim with details) |
92
+ | `gap_recorded` | YES (node detail only, summary flags only) | gapId, severity (info/warning/critical), reason (category + detail), summary, resolution, evidenceRefs |
93
+ | `context_set` | PARTIAL -- only `taskComplexity` is elevated to execution trace contextFacts | contextId, full context object (JsonObject), source (initial/agent_delta) -- all other keys invisible |
94
+ | `divergence_recorded` | YES (surfaced as execution trace item kind='divergence') | divergenceId, reason (enum), summary, relatedStepId |
95
+ | `decision_trace_appended` | YES (surfaced in executionTraceSummary, but panel not yet implemented in UI) | traceId, entries (kind, summary, refs) -- entry kinds: selected_next_step, evaluated_condition, entered_loop, exited_loop, detected_non_tip_advance |
96
+
97
+ ### Projections That Exist But Are Not Wired to Console DTOs
98
+
99
+ | Projection | File | Status |
100
+ |---|---|---|
101
+ | `projectAssessmentsV2` | `projections/assessments.ts` | Full projection, not called in console-service.ts |
102
+ | `projectAssessmentConsequencesV2` | `projections/assessment-consequences.ts` | Full projection, not called in console-service.ts |
103
+ | `projectCapabilitiesV2` | `projections/capabilities.ts` | Full projection, not called in console-service.ts |
104
+ | `projectRunContextV2` | `projections/run-context.ts` | Full projection, only used for session title derivation -- full context object not exposed |
105
+ | `projectRunExecutionTraceV2` | `projections/run-execution-trace.ts` | Computed and placed in `ConsoleDagRun.executionTraceSummary`, but the UI panel comment says "not yet implemented" |
106
+
107
+ ### What `ConsoleDagRun.executionTraceSummary` Contains (Already Computed, Not Rendered)
108
+
109
+ The `executionTraceSummary` field on `ConsoleDagRun` is populated today via
110
+ `projectRunExecutionTraceV2`. It contains:
111
+
112
+ **Items** (from `decision_trace_appended` and `divergence_recorded`):
113
+ - `selected_next_step` -- engine chose a step; summary explains why
114
+ - `evaluated_condition` -- engine evaluated a routing condition; summary + condition_id ref
115
+ - `entered_loop` -- loop entry; summary + loop_id ref
116
+ - `exited_loop` -- loop exit (condition false OR max iterations); summary + loop_id ref
117
+ - `detected_non_tip_advance` -- advance from a non-tip node (DAG fork); summary
118
+ - `divergence` -- deliberate divergence; reason enum + summary + optional step_id ref
119
+
120
+ **Context Facts** (from `context_set`):
121
+ - Only `taskComplexity` is extracted (hardcoded `CONTEXT_KEYS_TO_ELEVATE`)
122
+ - All other context keys are ignored
123
+
124
+ ### Edge Cause Codes (Invisible Today)
125
+
126
+ Each `edge_created` event carries a `cause` field with one of four codes:
127
+ - `idempotent_replay` -- the same advance was replayed (checkpoint recovery)
128
+ - `intentional_fork` -- user or engine deliberately branched
129
+ - `non_tip_advance` -- agent advanced from a non-tip node (deliberate branch)
130
+ - `checkpoint_created` -- edge created by checkpoint operation
131
+
132
+ These cause codes are stored in the DAG projection (`RunDagEdgeV2.cause`) but are not
133
+ included in the `ConsoleDagEdge` DTO -- so edges appear as undifferentiated lines.
134
+
135
+ ### `blocked_attempt` Node Kind (Confusing Today)
136
+
137
+ Nodes with `nodeKind === 'blocked_attempt'` appear in the DAG alongside `step` and
138
+ `checkpoint` nodes. The console surfaces this kind label on `ConsoleDagNode.nodeKind`,
139
+ but there is no contextual explanation of:
140
+ - what blocked the attempt (the blockers from the `advance_recorded` event's `outcome.blockers`)
141
+ - how many re-attempts were made
142
+ - what validation failures triggered the block (linked `validation_performed` events)
143
+
144
+ ### Assessment Gate Results (Completely Invisible)
145
+
146
+ The `assessment_recorded` and `assessment_consequence_applied` events form a complete
147
+ quality-gate audit trail:
148
+
149
+ **`assessment_recorded`** contains:
150
+ - `dimensions[]` -- each with dimensionId, level, optional rationale, normalization type
151
+ - `summary` -- overall assessment text
152
+ - `normalizationNotes[]` -- notes about how values were normalized
153
+
154
+ **`assessment_consequence_applied`** contains:
155
+ - Which dimension triggered the consequence
156
+ - The consequence effect (currently always `require_followup` with guidance text)
157
+
158
+ None of this is exposed in any console DTO today. The `projectAssessmentsV2` and
159
+ `projectAssessmentConsequencesV2` projections exist but are not called from
160
+ `console-service.ts`.
161
+
162
+ ### Capability Probing Results (Completely Invisible)
163
+
164
+ `capability_observed` events record:
165
+ - Whether `delegation` or `web_browsing` is available/unavailable/unknown
166
+ - How it was determined: probe_step, attempted_use, or manual_claim
167
+ - Failure codes (tool_missing, tool_error, policy_blocked, unknown) for attempted_use failures
168
+
169
+ This explains why a run took a degraded path when a capability was unavailable.
170
+ The `projectCapabilitiesV2` projection exists but is not called from `console-service.ts`.
171
+
172
+ ### Preferences Changes (Invisible)
173
+
174
+ `preferences_changed` events record:
175
+ - Who changed preferences: user, workflow_recommendation, or system
176
+ - What changed: autonomy mode, riskPolicy
177
+ - The effective state after the change
178
+
179
+ This explains why a run's behavior changed mid-execution (e.g. switched from guided to full_auto).
180
+
181
+ ### Run Context -- Full Object (Mostly Invisible)
182
+
183
+ `context_set` records the full run context as a JsonObject. The console only elevates
184
+ `taskComplexity`. Other common context keys used for routing include:
185
+ - `goal`, `taskDescription` (already used for title derivation, not exposed as facts)
186
+ - `mrTitle`, `prTitle`, `ticketTitle`, `problem` (same)
187
+ - Any custom keys the workflow uses for routing decisions
188
+
189
+ The `projectRunContextV2` projection returns the full context object but only
190
+ `taskComplexity` is surfaced in the execution trace.
191
+
192
+ ---
193
+
194
+ ## Priority Zero: Fast-Win Starting Point
195
+
196
+ Before proceeding to the full initiative scope, implement these two items. They cover the three named confusion patterns from the brief and require the least backend change:
197
+
198
+ 1. **Render the existing `executionTraceSummary` panel** (Tier 1 -- no backend change). The data is already computed and in `ConsoleDagRun.executionTraceSummary`. This explains fast-path phase skips (`selected_next_step`, `evaluated_condition` trace entries), loop structural jumps (`entered_loop`, `exited_loop`), and the `taskComplexity` routing driver. Zero backend work required.
199
+
200
+ 2. **Add blocker detail to `ConsoleAdvanceOutcome`** (Tier 3 -- DTO extension). This explains why `blocked_attempt` nodes exist. Each blocker has a typed code (10-value enum), a typed pointer (context_key/capability/output_contract/workflow_step), a message, and optional suggestedFix. Currently only `outcome.kind = 'blocked'` is exposed.
201
+
202
+ **Scope-reduction trigger:** If user testing after shipping Priority Zero items shows users can explain all named confusion patterns, scope down to Candidate C and defer items 3-23.
203
+
204
+ ---
205
+
206
+ ## Comprehensive Gap List: Engine Records But Console Doesn't Surface
207
+
208
+ ### (a) Workflow Structure Decisions
209
+
210
+ 1. **Edge cause codes** -- why each edge exists (idempotent_replay, intentional_fork, non_tip_advance, checkpoint_created). Invisible on `ConsoleDagEdge` today.
211
+ 2. **Condition evaluation results** -- `decision_trace_appended` entries with `kind='evaluated_condition'` include a condition_id ref and a summary, but the UI panel for executionTraceSummary is marked "not yet implemented."
212
+ 3. **Step selection rationale** -- `selected_next_step` entries in the decision trace explain why the engine chose a particular next step (e.g. fast-path skip). Same panel gap.
213
+ 4. **Non-tip advance detection** -- `detected_non_tip_advance` entries explain DAG forks. Same panel gap.
214
+
215
+ ### (b) Run Context and Routing
216
+
217
+ 5. **Full run context object** -- all keys beyond `taskComplexity` from `context_set` events are invisible. Any key the workflow uses for routing decisions (e.g. complexity tier, feature flags, user preferences) is hidden.
218
+ 6. **Context source** -- whether context was set at `initial` startup vs. updated via `agent_delta` is not surfaced.
219
+ 7. **`taskComplexity` value** -- technically in `executionTraceSummary.contextFacts`, but the UI panel isn't implemented yet.
220
+ 8. **Preferences at time of run** -- `preferences_changed` events record autonomy mode and riskPolicy transitions, explaining mid-run behavior changes. Completely invisible.
221
+
222
+ ### (c) Assessment Results
223
+
224
+ 9. **Assessment dimensions** -- each dimension's level and rationale from `assessment_recorded` is fully invisible. Users don't know whether a quality gate passed or was borderline.
225
+ 10. **Assessment summary** -- the overall summary text from `assessment_recorded` is invisible.
226
+ 11. **Assessment normalization notes** -- how input values were normalized before assessment is invisible.
227
+ 12. **Assessment consequences** -- when an assessment dimension triggered a `require_followup` consequence, the triggering dimension, level, and guidance text are invisible.
228
+
229
+ ### (d) Loop and Iteration State
230
+
231
+ 13. **Loop entry events** -- `entered_loop` decision trace entries with loop_id refs are in the execution trace but the UI panel isn't implemented.
232
+ 14. **Loop exit reasons** -- `exited_loop` entries explain whether a loop exited due to condition=false or maxIterations reached. Invisible.
233
+ 15. **Loop iteration counts** -- the engine tracks 0-based iteration on the loop stack; this is not surfaced anywhere in the console.
234
+
235
+ ### (e) Blocked Attempt Context
236
+
237
+ Note: "Why is this node blocked?" requires BOTH Tier 2 (validation failure linkage) AND Tier 3 (blocker detail) simultaneously -- these are two distinct workstreams that must be coordinated.
238
+
239
+ 16. **Blocker codes and pointers** -- each blocker in a `BlockerReport` has a typed `code` (one of 10 enum values: USER_ONLY_DEPENDENCY, MISSING_REQUIRED_OUTPUT, INVALID_REQUIRED_OUTPUT, MISSING_REQUIRED_NOTES, MISSING_CONTEXT_KEY, CONTEXT_BUDGET_EXCEEDED, REQUIRED_CAPABILITY_UNKNOWN, REQUIRED_CAPABILITY_UNAVAILABLE, INVARIANT_VIOLATION, STORAGE_CORRUPTION_DETECTED) and a typed `pointer` (discriminated union pointing at a specific context_key, capability, output_contract, or workflow_step). The console only shows `outcome.kind = 'blocked'`.
240
+ 17. **Blocker messages and suggestedFix** -- each blocker carries a human-readable `message` and optional `suggestedFix`. Completely invisible.
241
+ 18. **Validation failure linkage** -- `validation_performed` events are surfaced on node detail but not linked back to the blocked_attempt node that caused the block. The causal chain (validation fail -> blocked outcome -> blocked_attempt node) is not visualized.
242
+
243
+ ### (f) Gap Reason Detail
244
+
245
+ 19. **Gap reason category and detail** -- `gap_recorded` events carry a discriminated union `reason` field with category (`user_only_dependency`, `contract_violation`, `capability_missing`, `unexpected`) and a typed detail string. The `ConsoleNodeGap` DTO drops this entirely -- only `severity`, `summary`, and `isResolved` are exposed.
246
+ 20. **Gap evidence refs** -- `gap_recorded` events carry optional `evidenceRefs` (event or output pointers) explaining what produced the gap. Completely invisible.
247
+
248
+ ### (g) Capability and Environment
249
+
250
+ 21. **Capability probe results** -- whether delegation/web_browsing was available and how it was determined. Explains why a run took a degraded path. Completely invisible.
251
+ 22. **Capability failure codes** -- when a capability was attempted and failed, the failure code (tool_missing, tool_error, policy_blocked) is recorded but invisible.
252
+ 23. **Observation confidence** -- `observation_recorded` events include a confidence field (low/med/high). Git branch is shown but confidence is dropped.
253
+
254
+ ---
255
+
256
+ ## Problem Frame Packet
257
+
258
+ ### Primary Stakeholders
259
+
260
+ **WorkRail console users (agent operators):**
261
+ - Job: understand why their session executed the way it did
262
+ - Pain: DAG looks "broken" (phases skipped, unexpected forks, blocked_attempt nodes) with no explanation
263
+ - Success: can read a run and answer "why did the agent skip phase 3?" without opening raw JSON event logs
264
+
265
+ **WorkRail workflow authors:**
266
+ - Job: verify their routing logic, conditions, and assessment gates are working as designed
267
+ - Pain: no visibility into condition evaluation results, taskComplexity routing, or assessment dimensions
268
+ - Success: can see that `taskComplexity=simple` caused the fast-path and the assessment gate produced level=acceptable
269
+
270
+ **WorkRail maintainers / platform team:**
271
+ - Job: debug stuck or confusing runs reported by users
272
+ - Pain: must correlate across raw event log, projection code, and console UI manually
273
+ - Success: console surfaces enough that most run explanations don't require console->event-log context switching
274
+
275
+ ### Core Tensions
276
+
277
+ 1. **Information density vs. clarity** -- showing 23 hidden data items simultaneously would overwhelm users. The tension is deciding what to surface by default vs. on-demand drill-down.
278
+
279
+ 2. **"Why" is scattered across event kinds** -- routing decisions live in `decision_trace_appended`, but the context that drove those decisions lives in `context_set`, and the consequences live in `assessment_consequence_applied`. These must be composed coherently, not dumped as raw events.
280
+
281
+ 3. **Projection gap vs. DTO gap vs. rendering gap** -- the three tiers of gaps require different amounts of work:
282
+ - Tier 1 (rendering only): `executionTraceSummary` data is already in the DTO
283
+ - Tier 2 (service wiring): assessments/capabilities need console-service.ts calls + DTO fields
284
+ - Tier 3 (implicit data): blocker detail, gap reason, edge cause codes need DTO shape changes
285
+
286
+ ### Success Criteria
287
+
288
+ - A user can explain a fast-path skip ("jumped phase 0 to phase 5") by reading the console alone
289
+ - A user can explain a blocked_attempt node ("what blocked it and why")
290
+ - A user can see what drove routing (taskComplexity or other context variables)
291
+ - Assessment gate outcomes (passed/borderline dimensions, triggered follow-ups) are readable
292
+ - Loop iteration entry/exit reasons are visible
293
+
294
+ ### Framing Risks (What Could Make This Wrong)
295
+
296
+ 1. **Volume problem misframed as visibility problem** -- if users are overwhelmed today by the existing UI, adding 23 more data items might make it worse. The real ask may be "hide the complexity better" rather than "show more." Counter-evidence: the ask is explicitly for scoping "what should be visible" -- the design phase can address filtering/progressive disclosure.
297
+
298
+ 2. **Execution trace panel is sufficient** -- if the `executionTraceSummary` panel (already computed, just unrendered) covers 80% of user confusion, the other 22 items may be low-priority noise. Counter-evidence: assessment results and blocker detail are clearly outside the execution trace and are independently high-value.
299
+
300
+ 3. **Wrong user model** -- if primary users are "curious about the run" rather than "debugging a failure," the priority order changes (assessment results matter more than blocked_attempt blocker codes). No user research data available to validate.
301
+
302
+ ### How Might We Questions
303
+
304
+ - HMW: make the DAG self-explanatory so users never need to wonder why a node exists?
305
+ - HMW: surface "why this path" as a first-class affordance tied to each edge, not as a separate panel?
306
+
307
+ ### The Central Framing
308
+
309
+ The engine's event log is a complete trace of "what happened and why."
310
+ The console currently shows only "what happened" (the DAG topology).
311
+ The "why" layer exists in 14+ event kinds that are either completely invisible or
312
+ surfaced in an unimplemented UI panel.
313
+
314
+ ### Priority Signal from the Code
315
+
316
+ The `executionTraceSummary` field is already computed and placed in the DTO with a
317
+ comment "not yet implemented" on the UI side. This is the highest-priority gap: the
318
+ data is already there, the projection is already running, only the rendering is missing.
319
+
320
+ The assessment projections (`projectAssessmentsV2`, `projectAssessmentConsequencesV2`)
321
+ are complete but not called from the service layer at all -- they require both service
322
+ wiring and DTO additions.
323
+
324
+ ---
325
+
326
+ ## Candidate Directions
327
+
328
+ Three candidate framings were evaluated for how to present this gap list to the design team.
329
+ All three cover the same 23 items -- they differ in organization and recommended priority.
330
+
331
+ ### Candidate A: Tier-Organized (simplest, engineering-first)
332
+
333
+ Organize by implementation effort tier:
334
+ - Tier 1 (rendering only): render the `executionTraceSummary` panel -- data already in DTO
335
+ - Tier 2 (service wiring + DTO extension): wire assessment, capability projections to service
336
+ - Tier 3 (DTO shape change): blocker detail, gap reason detail, edge cause codes, full context, preferences
337
+
338
+ **Resolves:** DTO stability, sequential scoping.
339
+ **Accepts:** Design team gets a backlog, not a user-facing vision.
340
+ **Scope:** Too narrow for a design initiative kickoff; best for engineering sprint planning.
341
+
342
+ ### Candidate B: User-Question-Organized (recommended)
343
+
344
+ Organize by the user question each item answers, with tier noted per item:
345
+ - "Why did the run skip phases?" -- decision trace + context (Tier 1 + Tier 3)
346
+ - "Why is this node blocked?" -- blocker detail (Tier 3)
347
+ - "What did the quality gate decide?" -- assessment dimensions + consequences (Tier 2)
348
+ - "What happened in this loop?" -- loop entry/exit events (Tier 1), iteration count (Tier 3)
349
+ - "Why did behavior change mid-run?" -- preferences changes (Tier 3)
350
+ - "Why did the run take a degraded path?" -- capability probing results (Tier 2)
351
+
352
+ **Resolves:** UI coherence, design team gets a vision.
353
+ **Accepts:** Engineering must read more carefully to extract tier information.
354
+ **Scope:** Best-fit for design initiative kickoff. Covers all stakeholder groups.
355
+
356
+ ### Candidate C: Minimum Viable Explainability (narrowest scope)
357
+
358
+ Surface only the items that explain the three specific confusion patterns named in the problem statement:
359
+ 1. Fast-path phase skips: execution trace entries -- Tier 1 rendering only
360
+ 2. blocked_attempt nodes: blocker codes + messages -- Tier 3 DTO extension
361
+ 3. Loop structural jumps: entered_loop/exited_loop trace entries -- Tier 1 rendering only
362
+
363
+ **Resolves:** YAGNI, fastest path to user-visible improvement.
364
+ **Accepts:** Assessment results, capability degradation, preferences deferred.
365
+ **Scope:** Best-fit for a quick-win sprint; too narrow for a full design initiative.
366
+
367
+ ### Recommendation: Candidate B
368
+
369
+ The stated goal is a design initiative scoping document. Candidate B is the best fit because:
370
+ - Design kickoffs need user-question framing to build progressive-disclosure models
371
+ - Tier information is preserved per item so engineering can extract a backlog
372
+ - All three stakeholder groups (operators, workflow authors, platform maintainers) are covered
373
+ - The current structure of this doc already implements Candidate B
374
+
375
+ **Pivot condition:** If this is reframed as an engineering sprint plan, use Candidate A. If it is a quick-win sprint, use Candidate C.
376
+
377
+ **Strongest counter-argument:** Candidate C is faster and directly solves the named pain points. Counter: it leaves assessment and capability gaps unaddressed, which matters for workflow authors.
378
+
379
+ ---
380
+
381
+ ## Decision Log
382
+
383
+ - Chose `landscape_first` path: the problem is clearly framed (what does the engine record
384
+ vs. what does the console show). No reframing is needed.
385
+ - Read source files directly rather than delegating: the codebase is small and targeted,
386
+ delegation would add latency without quality improvement.
387
+ - Organized gap list by user question (Candidate B) rather than by tier (Candidate A) or
388
+ minimum viable scope (Candidate C) -- best fit for a design initiative kickoff.
389
+ - Identified 23 gaps across 7 categories, all traceable to specific event kinds or projections.
390
+ - Added Priority Zero callout (hybrid from Candidate C) to provide a fast-win starting point.
391
+ - Added two-tier dependency note for 'why blocked?' section (Tier 2 + Tier 3 required simultaneously).
392
+ - Runner-up: Candidate C -- the right choice if the initiative is reframed as a quick-win sprint.
393
+ - Residual risks: (1) user model unvalidated -- priority order is assumption-driven; (2) session detail API latency with 3 new projection calls not benchmarked; (3) CONTEXT_KEYS_TO_ELEVATE extension requires workflow-wide knowledge.
394
+ - Confidence band: medium-high. Gap list is fully grounded in code; priority ordering within it has a known user-model assumption.
@@ -0,0 +1,77 @@
1
+ # Design Review: WorkRail Console Three-Layer Execution Trace
2
+
3
+ ## Purpose
4
+ This document records the design review of the proposed three-layer execution trace feature
5
+ for the WorkRail Console. It is a human-readable companion to the WorkRail workflow execution
6
+ notes. Execution truth lives in the workflow notes and context variables, not here.
7
+
8
+ ## Context / Ask
9
+ Review the proposed three-layer execution trace design holistically before implementation.
10
+ Determine whether the combined interaction model is coherent or needs restructuring.
11
+
12
+ ## Path Recommendation
13
+ `full_spectrum` -- the design is already proposed; risk is coherence of the combined model,
14
+ not ignorance of options. Both landscape grounding (codebase constraints) and reframing
15
+ pressure (do the layers conflict?) are required.
16
+
17
+ ## Constraints / Anti-goals
18
+ See workflow context variables for the full list.
19
+
20
+ ---
21
+
22
+ ## Landscape Packet
23
+ *(filled during landscape phase)*
24
+
25
+ ## Problem Frame Packet
26
+ *(filled during problem framing phase)*
27
+
28
+ ## Candidate Directions
29
+
30
+ ### Generation expectations (for synthesis quality check)
31
+ - `full_spectrum` path: candidates must reflect both landscape constraints and reframing pressure
32
+ - Must include at least one direction that extends the existing NodeDetailSection rather than adding a new surface (tests the riskiest assumption)
33
+ - Must include at least one direction that preserves the proposed floating overlay with precise conflict resolutions
34
+ - Must include a bidirectional linking direction as an alternative to Layer 2 overlay
35
+ - THOROUGH: if the first set feels clustered, push for divergence on the ambient data duplication issue
36
+
37
+ *(filled during candidate generation step)*
38
+
39
+ ## Challenge Notes
40
+ *(filled during adversarial challenge phase)*
41
+
42
+ ## Resolution Notes
43
+ *(filled during resolution phase)*
44
+
45
+ ## Decision Log
46
+ *(key decisions recorded here)*
47
+
48
+ ## Final Summary
49
+
50
+ ### Verdict: The three-layer concept is coherent. Layer 2 needs restructuring before implementation.
51
+
52
+ **Confidence band:** HIGH
53
+
54
+ ### Selected direction: Hybrid B+C
55
+
56
+ **Layer 1 (TRACE tab):** Implement exactly as proposed. Zero backend cost, tab CSS infrastructure already exists in `index.css`. No changes needed.
57
+
58
+ **Layer 2 (routing context for selected node):** Replace the floating overlay with two new entries prepended to the `SECTION_REGISTRY` array in `NodeDetailSection.tsx`:
59
+ - `routing` section (first entry): filters `executionTraceSummary.items` by `refs.some(r => r.kind === 'node_id' && r.value === nodeId)`, renders `[ WHY SELECTED ]` / `[ CONDITIONS EVALUATED ]` / `[ LOOP ]` / `[ DIVERGENCE ]` groupings with the proposed badge vocabulary
60
+ - `run_routing` section (second entry, collapsed by default): shows ambient items with no node_id ref
61
+ - contextFact chip strip in `RunLineageDag` DAG header (below SummaryChips row), conditional on `run.executionTraceSummary?.contextFacts.length > 0`
62
+ - `routingEventCount` badge on DAG tab header when node is selected (label: `[ N routing decisions ]`, not just a count)
63
+ - TRACE tab label counter for live runs (visible in DAG mode when new ambient items arrive)
64
+
65
+ **Layer 3 (DAG annotations):** Implement edge cause diamonds, loop brackets, CAUSE button on blocked_attempt nodes. Ghost nodes explicitly gated on backend confirmation of `skipped_step` kind in `ConsoleExecutionTraceItemKind`.
66
+
67
+ ### Runner-up: Proposed floating overlay (Candidate A)
68
+ Revisit if user research validates: (a) spatial anchoring reduces debugging time on complex DAGs, or (b) users are confused about which node the right panel refers to.
69
+
70
+ ### Residual risks
71
+ 1. **SECTION_REGISTRY ordering fragility:** A code comment must mark the routing section as "must remain first" to survive future contributions.
72
+ 2. **Post-run vs live debugging use case:** If live debugging is the primary use case, ambient routing context in DAG mode may need a stronger signal than the TRACE tab label counter.
73
+
74
+ ### Decision log
75
+ - Floating overlay rejected because: (a) single click opens two simultaneous surfaces (overlay + right panel); (b) overlay-in-TRACE is an illegal state with no structural prevention; (c) overlay positioning in scroll-container coordinates is non-trivial with ambiguous value.
76
+ - NodeDetailSection SECTION_REGISTRY chosen because: one-entry addition at the explicit extension point, zero new surface logic, satisfies all five acceptance criteria, honors all philosophy principles.
77
+ - contextFact chip strip borrowed from proposed Layer 2 to preserve ambient DAG-mode context visibility.
@@ -0,0 +1,92 @@
1
+ # Design Review Findings: WorkRail Console Execution-Trace Explainability
2
+
3
+ *Reviews Candidate B: User-Question-Organized gap list*
4
+ *Source discovery doc: `console-explainability-discovery.md`*
5
+ *Source candidates doc: `console-explainability-design-candidates.md`*
6
+
7
+ ---
8
+
9
+ ## Tradeoff Review
10
+
11
+ **Tradeoff 1: Engineering reads user-question organization to extract tier info**
12
+ - Acceptable under the stated scope (design initiative, not sprint planning).
13
+ - Unacceptable if: primary consumer is an engineering sprint meeting. Mitigation: tier notation is explicit per item.
14
+ - Hidden assumption: design and implementation planning are separate phases.
15
+
16
+ **Tradeoff 2: Cross-cutting complexity hidden inside user questions**
17
+ - 'Why blocked?' requires both Tier 2 (validation linkage) and Tier 3 (blocker detail simultaneously). The user question framing obscures this.
18
+ - Acceptable under 'what should be visible, not how to implement.'
19
+ - Unacceptable if: design initiative owner must produce implementation estimates at kickoff.
20
+
21
+ **Tradeoff 3: Design team may underestimate implementation cost**
22
+ - Acceptable: cost estimation is explicitly out of scope in the brief.
23
+ - Unacceptable if: the initiative requires a go/no-go based on cost at kickoff stage.
24
+
25
+ ---
26
+
27
+ ## Failure Mode Review
28
+
29
+ **Failure Mode 1 (highest risk): Execution trace panel covers 80% of confusion**
30
+ - The executionTraceSummary data (6 decision-trace entry kinds + taskComplexity context fact) is already computed and in the DTO. If this single UI change resolves the dominant user confusion, the full 23-item initiative scope is unnecessary.
31
+ - Handling: framing risk is named in the discovery doc. Tier notation allows teams to stop at Tier 1.
32
+ - Missing mitigation: no recommendation to validate with users before proceeding to full design phase.
33
+
34
+ **Failure Mode 2: No user research before design**
35
+ - 4 open questions are surfaced in the candidates doc. The priority order within Candidate B is assumption-driven.
36
+ - Handling: questions are explicit.
37
+ - Missing mitigation: no specific lightweight research method recommended.
38
+
39
+ ---
40
+
41
+ ## Runner-Up / Simpler Alternative Review
42
+
43
+ **Runner-up (Candidate C):** Has a real strength -- explicitly names the 3-item minimum that addresses the named confusion patterns. Candidate B should borrow this.
44
+
45
+ **Hybrid adopted:** Add a 'Priority Zero' callout to the discovery doc that highlights the minimum viable items (execution trace panel + blocker detail) as the recommended fast-win starting point, before the full initiative scope.
46
+
47
+ **Simplification explored:** Limiting to Tier 1+2 only (cutting all Tier 3 items). Rejected: blocker detail (Tier 3) is the most user-critical item for the 'why blocked?' user question. Cutting it would leave blocked_attempt nodes unexplained.
48
+
49
+ ---
50
+
51
+ ## Philosophy Alignment
52
+
53
+ **Clearly satisfied:** make illegal states unrepresentable (DTO extension guidance), document 'why' not 'what' (entire doc structure), validate at boundaries (projection architecture respected), YAGNI with discipline (staged tier structure + Priority Zero callout).
54
+
55
+ **Under acceptable tension:** YAGNI vs. completeness. Resolved by the tier structure and Priority Zero callout that give teams permission to stop at any tier.
56
+
57
+ **No risky tensions.** Philosophy alignment is solid.
58
+
59
+ ---
60
+
61
+ ## Findings
62
+
63
+ ### Orange: No user research validation before design
64
+ **Impact:** The priority order within the 23-item gap list is assumption-driven. If execution trace panel alone resolves dominant confusion, the design initiative scope is over-stated.
65
+ **What to watch:** If a 3-session user research pulse shows users can orient once the execution trace panel is rendered, immediately scope down to Candidate C.
66
+
67
+ ### Yellow: Cross-cutting complexity obscured by user-question framing
68
+ **Impact:** The 'why blocked?' user question requires both Tier 2 and Tier 3 work. A design team reading only the user question might plan a single sprint for it when it actually needs two distinct work items.
69
+ **Mitigation:** Add a note to the 'why blocked?' section flagging the two-tier dependency.
70
+
71
+ ### Yellow: 'Priority Zero' callout not yet in the discovery doc
72
+ **Impact:** Without an explicit fast-win callout, the design team sees a 23-item list with no obvious starting point.
73
+ **Mitigation:** Add the Priority Zero section (hybrid from runner-up analysis) before presenting the doc.
74
+
75
+ ---
76
+
77
+ ## Recommended Revisions
78
+
79
+ 1. **Add Priority Zero callout** to `console-explainability-discovery.md` that identifies the 3-item minimum: execution trace panel (Tier 1) + blocker detail (Tier 3). Serves as the fast-win path.
80
+
81
+ 2. **Add a user research recommendation** before proceeding to full design phase. Suggest a 30-minute session with 3-5 users watching a confusing run to validate that execution trace panel alone is insufficient.
82
+
83
+ 3. **Add a two-tier dependency note** under the 'why blocked?' section to make cross-cutting complexity visible.
84
+
85
+ 4. **Add research-pulse trigger:** "If users can explain phase-skip confusion after execution trace panel is rendered, switch to Candidate C scope and defer assessment/capability items."
86
+
87
+ ---
88
+
89
+ ## Residual Concerns
90
+
91
+ - No performance benchmarks exist for session detail API response time with 3 additional projection calls (assessments, capabilities, preferences). If the API is on a latency-sensitive path, Tier 2 projection wiring may need profiling before committing.
92
+ - The `CONTEXT_KEYS_TO_ELEVATE` constant in `run-execution-trace.ts` is hardcoded to `['taskComplexity']`. Extending this to surface additional context keys is Tier 1 work but requires knowing which keys are routing-critical across all workflows -- not a codebase-only question.