@exaudeus/workrail 3.67.0 → 3.68.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (144) hide show
  1. package/dist/application/services/compiler/template-registry.js +10 -1
  2. package/dist/cli/commands/worktrain-init.js +1 -1
  3. package/dist/console-ui/assets/{index-tOl8Vowf.js → index-DPdRJHMX.js} +1 -1
  4. package/dist/console-ui/index.html +1 -1
  5. package/dist/coordinators/modes/full-pipeline.js +4 -4
  6. package/dist/coordinators/modes/implement-shared.js +5 -5
  7. package/dist/coordinators/modes/implement.js +4 -4
  8. package/dist/coordinators/pr-review.js +4 -4
  9. package/dist/daemon/workflow-runner.d.ts +1 -0
  10. package/dist/daemon/workflow-runner.js +1 -0
  11. package/dist/manifest.json +31 -31
  12. package/dist/mcp/handlers/v2-context-budget.js +18 -0
  13. package/dist/mcp/handlers/v2-workflow.js +1 -1
  14. package/dist/mcp/workflow-protocol-contracts.js +2 -2
  15. package/dist/v2/durable-core/constants.d.ts +2 -0
  16. package/dist/v2/durable-core/constants.js +2 -1
  17. package/dist/v2/projections/session-metrics.js +1 -1
  18. package/docs/authoring-v2.md +4 -4
  19. package/docs/changelog-recent.md +3 -3
  20. package/docs/configuration.md +1 -1
  21. package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
  22. package/docs/design/adaptive-coordinator-context.md +1 -1
  23. package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
  24. package/docs/design/adaptive-coordinator-routing-review.md +1 -1
  25. package/docs/design/adaptive-coordinator-routing.md +34 -34
  26. package/docs/design/agent-cascade-protocol.md +2 -2
  27. package/docs/design/console-daemon-separation-discovery.md +323 -0
  28. package/docs/design/context-assembly-design-candidates.md +1 -1
  29. package/docs/design/context-assembly-implementation-plan.md +1 -1
  30. package/docs/design/context-assembly-layer.md +2 -2
  31. package/docs/design/context-assembly-review-findings.md +1 -1
  32. package/docs/design/coordinator-access-audit.md +293 -0
  33. package/docs/design/coordinator-architecture-audit.md +62 -0
  34. package/docs/design/coordinator-error-handling-audit.md +240 -0
  35. package/docs/design/coordinator-testability-audit.md +426 -0
  36. package/docs/design/daemon-architecture-discovery.md +1 -1
  37. package/docs/design/daemon-console-separation-discovery.md +242 -0
  38. package/docs/design/daemon-memory-audit.md +203 -0
  39. package/docs/design/design-candidates-console-daemon-separation.md +256 -0
  40. package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
  41. package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
  42. package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
  43. package/docs/design/discovery-loop-fix-candidates.md +161 -0
  44. package/docs/design/discovery-loop-fix-design-review.md +106 -0
  45. package/docs/design/discovery-loop-fix-validation.md +258 -0
  46. package/docs/design/discovery-loop-investigation-A.md +188 -0
  47. package/docs/design/discovery-loop-investigation-B.md +287 -0
  48. package/docs/design/exploration-workflow-candidates.md +205 -0
  49. package/docs/design/exploration-workflow-design-review.md +166 -0
  50. package/docs/design/exploration-workflow-discovery.md +443 -0
  51. package/docs/design/ide-context-files-candidates.md +231 -0
  52. package/docs/design/ide-context-files-design-review.md +85 -0
  53. package/docs/design/ide-context-files.md +615 -0
  54. package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
  55. package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
  56. package/docs/design/in-process-http-audit.md +190 -0
  57. package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
  58. package/docs/design/loadSessionNotes-candidates.md +108 -0
  59. package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
  60. package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
  61. package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
  62. package/docs/design/probe-session-design-candidates.md +261 -0
  63. package/docs/design/probe-session-phase0.md +490 -0
  64. package/docs/design/routines-guide.md +7 -7
  65. package/docs/design/session-metrics-attribution-candidates.md +250 -0
  66. package/docs/design/session-metrics-attribution-design-review.md +115 -0
  67. package/docs/design/session-metrics-attribution-discovery.md +319 -0
  68. package/docs/design/session-metrics-candidates.md +227 -0
  69. package/docs/design/session-metrics-design-review.md +104 -0
  70. package/docs/design/session-metrics-discovery.md +454 -0
  71. package/docs/design/spawn-session-debug.md +202 -0
  72. package/docs/design/trigger-validator-candidates.md +214 -0
  73. package/docs/design/trigger-validator-review.md +109 -0
  74. package/docs/design/trigger-validator-shaping-phase0.md +239 -0
  75. package/docs/design/trigger-validator.md +454 -0
  76. package/docs/design/v2-core-design-locks.md +2 -2
  77. package/docs/design/workflow-extension-points.md +15 -15
  78. package/docs/design/workflow-id-validation-at-startup.md +1 -1
  79. package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
  80. package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
  81. package/docs/design/worktrain-task-queue-candidates.md +5 -5
  82. package/docs/design/worktrain-task-queue.md +4 -4
  83. package/docs/discovery/coordinator-script-design.md +1 -1
  84. package/docs/discovery/coordinator-ux-discovery.md +3 -3
  85. package/docs/discovery/simulation-report.md +1 -1
  86. package/docs/discovery/workflow-modernization-discovery.md +326 -0
  87. package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
  88. package/docs/discovery/worktrain-status-briefing.md +1 -1
  89. package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
  90. package/docs/docker.md +1 -1
  91. package/docs/ideas/backlog.md +227 -0
  92. package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
  93. package/docs/integrations/claude-code.md +5 -5
  94. package/docs/integrations/firebender.md +1 -1
  95. package/docs/plans/agentic-orchestration-roadmap.md +2 -2
  96. package/docs/plans/mr-review-workflow-redesign.md +9 -9
  97. package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
  98. package/docs/plans/ui-ux-workflow-discovery.md +2 -2
  99. package/docs/plans/workflow-categories-candidates.md +8 -8
  100. package/docs/plans/workflow-categories-discovery.md +4 -4
  101. package/docs/plans/workflow-modernization-design.md +430 -0
  102. package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
  103. package/docs/plans/workflow-staleness-detection-review.md +4 -4
  104. package/docs/plans/workflow-staleness-detection.md +9 -9
  105. package/docs/plans/workrail-platform-vision.md +3 -3
  106. package/docs/reference/agent-context-cleaner-snippet.md +1 -1
  107. package/docs/reference/agent-context-guidance.md +4 -4
  108. package/docs/reference/context-optimization.md +2 -2
  109. package/docs/roadmap/now-next-later.md +2 -2
  110. package/docs/roadmap/open-work-inventory.md +16 -16
  111. package/docs/workflows.md +31 -31
  112. package/package.json +1 -1
  113. package/spec/workflow-tags.json +47 -47
  114. package/workflows/adaptive-ticket-creation.json +16 -16
  115. package/workflows/architecture-scalability-audit.json +22 -22
  116. package/workflows/bug-investigation.agentic.v2.json +3 -3
  117. package/workflows/classify-task-workflow.json +1 -1
  118. package/workflows/coding-task-workflow-agentic.json +6 -6
  119. package/workflows/cross-platform-code-conversion.v2.json +8 -8
  120. package/workflows/document-creation-workflow.json +8 -8
  121. package/workflows/documentation-update-workflow.json +8 -8
  122. package/workflows/intelligent-test-case-generation.json +2 -2
  123. package/workflows/learner-centered-course-workflow.json +2 -2
  124. package/workflows/mr-review-workflow.agentic.v2.json +4 -4
  125. package/workflows/personal-learning-materials-creation-branched.json +8 -8
  126. package/workflows/presentation-creation.json +5 -5
  127. package/workflows/production-readiness-audit.json +1 -1
  128. package/workflows/relocation-workflow-us.json +31 -31
  129. package/workflows/routines/context-gathering.json +1 -1
  130. package/workflows/routines/design-review.json +1 -1
  131. package/workflows/routines/execution-simulation.json +1 -1
  132. package/workflows/routines/feature-implementation.json +3 -3
  133. package/workflows/routines/final-verification.json +1 -1
  134. package/workflows/routines/hypothesis-challenge.json +1 -1
  135. package/workflows/routines/ideation.json +1 -1
  136. package/workflows/routines/parallel-work-partitioning.json +3 -3
  137. package/workflows/routines/philosophy-alignment.json +2 -2
  138. package/workflows/routines/plan-analysis.json +1 -1
  139. package/workflows/routines/plan-generation.json +1 -1
  140. package/workflows/routines/tension-driven-design.json +6 -6
  141. package/workflows/scoped-documentation-workflow.json +26 -26
  142. package/workflows/ui-ux-design-workflow.json +14 -14
  143. package/workflows/workflow-diagnose-environment.json +1 -1
  144. package/workflows/workflow-for-workflows.json +1 -1
@@ -0,0 +1,319 @@
1
+ # Session Metrics Attribution -- Hybrid Architecture Discovery
2
+
3
+ **Date:** 2026-04-21
4
+ **Status:** Discovery complete -- recommendation ready
5
+
6
+ ---
7
+
8
+ ## About This Document
9
+
10
+ Human-readable artifact for sharing findings and recommendations. NOT execution truth -- workflow session notes and context variables are the durable record. Read this for understanding, not for resuming the workflow.
11
+
12
+ ---
13
+
14
+ ## Context / Ask
15
+
16
+ **Stated goal (original):** Design the right hybrid architecture for attributing git metrics (LOC, files changed, commits) to a specific WorkRail session, accounting for concurrent sessions on the same branch.
17
+
18
+ **Goal classification:** `solution_statement` -- prescribes a specific hybrid architecture with confidence levels and agent SHA reporting rather than describing the outcome.
19
+
20
+ **Reframed problem:** Given that git attribution via branch or time-window is unreliable when sessions overlap on the same branch, what is the minimal durable record a WorkRail session needs to carry so consumers can compute accurate git metrics even under adversarial conditions?
21
+
22
+ **Prior discoveries:**
23
+ - Discovery 1: rejected agent self-reporting of LOC/files (unreliable, agents grade own homework)
24
+ - Discovery 2: found `buildSuccessOutcome()` + `extraEventsToAppend` as the hook point for a `run_completed` event; `observation_recorded` keys are locked; no event timestamps today
25
+
26
+ ---
27
+
28
+ ## Path Recommendation
29
+
30
+ **Selected path:** `design_first`
31
+
32
+ **Rationale:** The goal is already a well-formed solution statement, and the codebase landscape was thoroughly covered in Discoveries 1 and 2. The dominant risk is NOT "we don't know the landscape" -- it's "we're designing the wrong concept or solving a problem that's rarer than we think." `design_first` is correct here because:
33
+ - The specific concurrent same-branch attribution problem has not been validated as common
34
+ - The proposed solution (agent SHA self-reporting) has the same reliability failure mode as what Discovery 1 rejected
35
+ - The accumulation problem for commit SHAs across steps is architecturally non-trivial and hasn't been resolved
36
+
37
+ ---
38
+
39
+ ## Constraints / Anti-goals
40
+
41
+ **Core constraints:**
42
+ - `observation_recorded` has a CLOSED discriminated union key set -- extending it requires schema + handler updates
43
+ - `mergeContext` uses SHALLOW merge semantics (locked at §18.2) -- arrays/objects are REPLACED, not merged
44
+ - `context_set` projection uses LATEST-WINS semantics per run -- a new `context_set` with `metrics_commit_shas: ["def"]` will silently replace an earlier `["abc"]`
45
+ - No wall-clock timestamps on events today -- `durationSeconds` is blocked on this
46
+ - Token usage is NOT accessible at the MCP layer
47
+ - The `buildSuccessOutcome()` function has access to `lockedIndex: SessionIndex`, which is the current session's index only -- no cross-session queries at that point
48
+
49
+ **Anti-goals:**
50
+ - Do not design for the general case (all possible attribution problems) -- focus on the specific concurrent same-branch case
51
+ - Do not make the high-confidence path depend on agent compliance if there's an engine-side alternative
52
+ - Do not add a new event kind just to hold data that fits in an existing mechanism with minor conventions
53
+
54
+ ---
55
+
56
+ ## Landscape Findings (Code Investigation)
57
+
58
+ ### Where `git_head_sha` is captured at session start
59
+
60
+ `src/mcp/handlers/v2-execution/start.ts` calls `resolveWorkspaceAnchors()` before `buildInitialEvents()`. The result is passed to `buildInitialEvents()` which emits one `observation_recorded` event per observation at the end of the initial event batch. The `git_head_sha` observation uses `type: 'git_sha1'` and regex-validates the 40-char hex format.
61
+
62
+ **Pattern for `run_completed`:** Mirror exactly -- capture end HEAD SHA via `resolveWorkspaceAnchors()` at the completion advance, emit an `observation_recorded` event (re-using the existing mechanism) rather than a new event kind.
63
+
64
+ **CRITICAL FINDING:** `observation_recorded` events are session-scoped (no scope field -- unique among all event kinds). The dedupeKey pattern is `observation_recorded:{sessionId}:{key}`. This means a second `git_head_sha` observation would collide with the first (same dedupeKey). To capture end HEAD SHA, either:
65
+ (a) Use a different key (e.g., `git_head_sha_end`) -- requires schema extension
66
+ (b) Emit the new `run_completed` event kind with the end SHA embedded in its data
67
+ (c) Change the dedupeKey to include the runId so start and end can coexist
68
+
69
+ Option (b) is cleanest: a `run_completed` event with `endGitSha` in its data avoids the dedupeKey collision and clearly expresses the semantic (session finished at this SHA).
70
+
71
+ ### What the `SessionIndex` provides at completion time
72
+
73
+ `buildSessionIndex()` in `src/v2/durable-core/session-index.ts` builds a single-session index from the current session's sorted event log. It provides:
74
+ - `runStartedByRunId`: workflow identity per run
75
+ - `runContextByRunId`: latest context per run
76
+ - `sortedEvents`: the full sorted event log
77
+
78
+ It does NOT provide cross-session data. At the point where `buildSuccessOutcome()` runs, only the current session's truth is loaded.
79
+
80
+ **Implication for concurrent session detection:** Detecting other active sessions on the same branch at completion time would require calling `LocalSessionSummaryProviderV2.loadHealthySummaries()` which is:
81
+ 1. An async, I/O-heavy operation (scans up to 200 sessions from disk)
82
+ 2. Not available as a port in `AdvanceCorePorts` today
83
+ 3. Sequential by design (to avoid hammering the file system)
84
+
85
+ This operation CANNOT be done synchronously at completion time without architectural changes to make it injectable. It would add significant latency to every final advance.
86
+
87
+ ### The `context_set` accumulation problem -- concrete analysis
88
+
89
+ `mergeContext` contract (locked at §18.2): arrays are REPLACED, not merged.
90
+
91
+ ```
92
+ step 5: context_set { metrics_commit_shas: ["abc123"] }
93
+ step 9: context_set { metrics_commit_shas: ["def456"] }
94
+ ```
95
+
96
+ Result: `metrics_commit_shas` = `["def456"]` -- "abc123" is permanently lost.
97
+
98
+ Fix options:
99
+ (a) Agent always sends full accumulated list: `["abc123", "def456"]` -- relies on agent memory, fragile
100
+ (b) New append-style event type: architecturally sound but requires new event kind, schema change, new projection
101
+ (c) Agent accumulates in a single context key by always including previous SHAs -- workable if the step prompt explicitly instructs this pattern
102
+
103
+ Option (c) is workable but brittle. Option (b) is architecturally correct but has ceremony cost.
104
+
105
+ ### What `run_completed` can and cannot contain (given current infrastructure)
106
+
107
+ **Can contain (authoritative, engine-computed):**
108
+ - `endGitSha` -- available at advance time via `resolveWorkspaceAnchors()`
109
+ - `runId` -- already in scope
110
+ - `captureConfidence` -- partially (see below)
111
+
112
+ **Cannot contain (blocked on infrastructure):**
113
+ - `durationSeconds` -- no event timestamps today; the session start time is not recorded anywhere in the event log
114
+ - Concurrent session detection -- cross-session queries not available at completion time
115
+
116
+ **Confidence level determination at completion time:**
117
+ The only confidence test available at completion time (without new infrastructure) is:
118
+ - Did the agent report `metrics_commit_shas`? -- check `lockedIndex.runContextByRunId`
119
+ - Is there a git_head_sha recorded? -- check sortedEvents for `observation_recorded` with key `git_head_sha`
120
+
121
+ Testing for concurrent sessions (to set `low` confidence) requires the session summary provider, which is not injectable at the advance handler level today.
122
+
123
+ ---
124
+
125
+ ## Design Candidates
126
+
127
+ ### Candidate A: Minimal `run_completed` event (Phase 1, shippable now)
128
+
129
+ Emit a `run_completed` event in `buildSuccessOutcome()` when `newEngineState.kind === 'complete'`.
130
+
131
+ **Event schema:**
132
+ ```typescript
133
+ {
134
+ kind: 'run_completed',
135
+ scope: { runId: string },
136
+ data: {
137
+ endGitSha: string | null, // engine-authoritative (from resolveWorkspaceAnchors)
138
+ startGitSha: string | null, // extracted from observation_recorded events in lockedIndex
139
+ agentCommitShas: readonly string[], // copied from context metrics_commit_shas if present (may be empty)
140
+ captureConfidence: 'high' | 'medium' | 'none',
141
+ // 'high' = agentCommitShas non-empty
142
+ // 'medium' = start+end SHA available (branch diff usable, concurrent risk unknown)
143
+ // 'none' = no git context
144
+ }
145
+ }
146
+ ```
147
+
148
+ **Confidence logic (no cross-session query):**
149
+ ```
150
+ if agentCommitShas.length > 0 → 'high'
151
+ else if startGitSha && endGitSha → 'medium'
152
+ else → 'none'
153
+ ```
154
+
155
+ Note: 'low' (concurrent same-branch sessions detected) is NOT computable at advance time without new infrastructure. Drop it from Phase 1.
156
+
157
+ **Accumulation problem:** Agent must send full accumulated list in each context_set. Step prompt must explicitly say "include all commit SHAs you have made so far" not "include the commit SHAs from this step."
158
+
159
+ **Schema change cost:**
160
+ - New event kind `run_completed` in `DomainEventV1Schema` discriminated union
161
+ - New Zod schema for data
162
+ - Update all exhaustive handlers (grep for exhaustive switches on DomainEventV1)
163
+ - New projection `projectRunCompletedV2` to surface this event to consumers
164
+
165
+ **What ships:** A `run_completed` event in the session log when a workflow completes. Consumers can compute LOC from `git diff --stat startGitSha..endGitSha`. When agent reported SHAs, use those for precise per-session attribution. When not, use the full diff with a 'medium' confidence label.
166
+
167
+ **What does NOT ship:** Duration, concurrent session detection, 'low' confidence tier.
168
+
169
+ ---
170
+
171
+ ### Candidate B: `run_completed` deferred -- use context_set convention only (skip Phase 2 entirely)
172
+
173
+ The Phase 1 design (Discoveries 1+2) already recommended `metrics_commit_shas` as a flat context key. The question "when does the accumulation problem matter?" deserves scrutiny:
174
+
175
+ Concurrent same-branch sessions are ONLY a problem when:
176
+ 1. Two sessions are running on the SAME branch (not just concurrently in time)
177
+ 2. Both sessions are making commits
178
+
179
+ In a healthy WorkRail usage pattern (worktrees, feature branches per session), concurrent sessions use different branches. The same-branch concurrency problem is a degraded operating mode, not the common case.
180
+
181
+ **Recommendation:** If same-branch concurrency is rare (< 5% of sessions based on actual usage data), a `run_completed` event adds schema complexity, maintenance cost, and exhaustive-switch update burden for marginal attribution improvement. The right move is:
182
+ 1. Require actual usage data before building Phase 2 infrastructure
183
+ 2. Phase 1 (`metrics_commit_shas` flat context key) already provides the high-confidence path when agents comply
184
+ 3. Consumers can use start/end SHAs from existing `observation_recorded` events for the 'medium' confidence case -- start SHA already exists; end SHA can be added via the existing `observation_recorded` mechanism with a new dedupeKey
185
+
186
+ **What ships:** Nothing new. Phase 1 convention is sufficient.
187
+
188
+ **Risk:** Attribution accuracy suffers for teams using same-branch workflows. But until we have data showing this is a real usage pattern, we don't know the actual impact.
189
+
190
+ ---
191
+
192
+ ### Candidate C: Append-style event for commit SHA accumulation (Phase 2, full solution)
193
+
194
+ Introduce a new event kind `commit_sha_appended` with `scope: { runId }` and `data: { sha: string, ref: string | null }`.
195
+
196
+ **Semantics:** Each time the agent makes a commit and reports it, they call `continue_workflow` with an artifact or emit a `commit_sha_appended` event (via a new MCP tool or a convention on continue_workflow). The projection folds all `commit_sha_appended` events into an ordered list.
197
+
198
+ **Advantage:** Solves the accumulation problem cleanly without relying on agent memory or the full-list convention.
199
+
200
+ **Cost:** New event kind, new schema, new MCP tool or convention, new projection. High ceremony.
201
+
202
+ **Verdict:** This is architecturally correct but over-engineered for the current problem. Worth pursuing if `commit_sha_appended` semantics turn out to be useful for more than just attribution (e.g., audit trail, PR linking). Defer until there's a concrete second use case.
203
+
204
+ ---
205
+
206
+ ## Recommended Resolution
207
+
208
+ **Phase 1 (ship now):** Candidate A -- minimal `run_completed` event. Rationale:
209
+ - Provides engine-authoritative `endGitSha` which is not available via any existing mechanism (the dedupeKey collision blocks using `observation_recorded` for end SHA)
210
+ - The `captureConfidence` field gives consumers a clear signal about attribution reliability
211
+ - `agentCommitShas` copied from context at completion time creates a reliable snapshot even if context is overwritten later
212
+ - Cost is bounded: one new event kind, one new schema, exhaustive switch updates (bounded set)
213
+
214
+ **Drop from Phase 1:**
215
+ - Concurrent session detection -- needs injectable `SessionSummaryProvider` in `AdvanceCorePorts`, adds latency, unvalidated use case
216
+ - `durationSeconds` -- blocked on event timestamps
217
+ - 'low' confidence tier -- requires cross-session query infrastructure
218
+
219
+ **Phase 2 (only if data supports it):**
220
+ - Inject `SessionSummaryProvider` into the advance handler ports to enable concurrent session detection
221
+ - Emit 'low' confidence when concurrent same-branch sessions are detected
222
+ - Requires measuring actual same-branch concurrency rate first
223
+
224
+ **Accumulation fix (required alongside Candidate A):**
225
+ The step prompt convention in `docs/authoring-v2.md` MUST say: "When reporting commit SHAs, always include the full list of all commits made during this session, not just commits from the current step." This is the only safe fix given the shallow-merge contract.
226
+
227
+ ---
228
+
229
+ ## Open Questions
230
+
231
+ 1. **What is the actual same-branch concurrency rate?** The decision to build cross-session detection infrastructure depends on this. 539 time-overlapping pairs does not equal 539 branch-overlapping pairs. This data is needed before committing to Phase 2.
232
+
233
+ 2. **Who are the consumers of `run_completed`?** The design assumes "the console dashboard" and "external analytics." Are there internal consumers (e.g., the daemon trigger system) that would benefit from a `run_completed` event? If so, this strengthens the case for Candidate A.
234
+
235
+ 3. **Does the `AdvanceCorePorts` interface need to be extended for workspace anchor resolution at completion time?** No -- confirmed via code investigation. The `workspacePath` is available on `V2ContinueWorkflowInput` (the input to `handleAdvanceIntent`). Pre-resolve `endGitSha` from `input.workspacePath` before calling `executeAdvanceCore`, then pass `endGitSha: string | null` as a parameter through the call chain to `buildSuccessOutcome`. No port refactor needed.
236
+
237
+ ---
238
+
239
+ ## Decision Log
240
+
241
+ ### Winner: Candidate A (run_completed with agentCommitShas snapshot) -- with gitBranch addition
242
+
243
+ **Why it wins:**
244
+ - Provides engine-authoritative `endGitSha` and `startGitSha` in a single self-contained event
245
+ - `agentCommitShas` snapshot from context is durable (survives future context overwrites, unlike leaving SHAs only in context_set)
246
+ - Follows `assessment_recorded` precedent (snapshots context data into event log at completion time)
247
+ - Fully reversible: if agent compliance is poor, consumers can ignore `agentCommitShas` and fall back to 'medium' -- no schema change needed
248
+ - Incremental cost over the endGitSha-only variant is trivial (one field + superRefine + sha1 validation)
249
+
250
+ **Why the runner-up (endGitSha-only) lost:**
251
+ - Would make 'high' confidence tier permanently unachievable, defeating the stated goal (concurrent same-branch disambiguation)
252
+ - Cost difference vs. winner is negligible
253
+
254
+ **Why "defer entirely" lost:**
255
+ - No mechanism for capturing `endGitSha` without a new event kind (dedupeKey collision blocks reusing `observation_recorded`)
256
+ - Without `endGitSha`, retrospective attribution queries break (current HEAD != session end SHA for old sessions)
257
+
258
+ ### Final schema (confirmed after adversarial challenge and review)
259
+
260
+ ```typescript
261
+ {
262
+ kind: 'run_completed',
263
+ scope: { runId: string },
264
+ data: {
265
+ startGitSha: string | null, // from observation_recorded events (git_head_sha key)
266
+ endGitSha: string | null, // resolved at completion via input.workspacePath
267
+ gitBranch: string | null, // from observation_recorded events (git_branch key)
268
+ agentCommitShas: readonly string[], // sha1 validated, from context.metrics_commit_shas
269
+ captureConfidence: 'high' | 'medium' | 'none',
270
+ // 'high' = agentCommitShas non-empty (requires superRefine cross-field invariant)
271
+ // 'medium' = startGitSha and endGitSha available
272
+ // 'none' = no git context
273
+ }
274
+ }
275
+ // Zod superRefine: captureConfidence === 'high' requires agentCommitShas.length > 0
276
+ // Zod: agentCommitShas entries validated as /^[0-9a-f]{40}$/
277
+ ```
278
+
279
+ ### Implementation pattern (threading fix)
280
+
281
+ In `src/mcp/handlers/v2-execution/continue-advance.ts`, before calling `executeAdvanceCore`:
282
+ ```typescript
283
+ // Lazy resolution: only resolve when session might be completing
284
+ // (exact guard inside buildSuccessOutcome, so resolve eagerly at the advance site)
285
+ const endGitSha: string | null = input.workspacePath
286
+ ? await resolveEndGitSha(ctx.v2, input.workspacePath)
287
+ : null;
288
+ ```
289
+
290
+ Pass `endGitSha` through `executeAdvanceCore` args to `buildSuccessOutcome`. Guard emission with `newEngineState.kind === 'complete'`.
291
+
292
+ ---
293
+
294
+ ## Final Summary
295
+
296
+ **Confidence band:** HIGH
297
+
298
+ **Residual risks:**
299
+ 1. Agent compliance with full-accumulated-list convention is the riskiest assumption -- mitigation is authoring docs + step prompt, not architecture
300
+ 2. No specific consumer of `run_completed` identified yet -- validate before Phase 2 investment
301
+ 3. Same-branch concurrency rate unknown -- Phase 2 (concurrent detection) is data-gated on this
302
+
303
+ **Phase 1 (ship without gates):**
304
+ - `run_completed` event kind with final schema above
305
+ - `projectSessionGitAttributionV2` pure projection
306
+ - `ConsoleSessionSummary` extension (optional `gitAttribution` field, null-safe)
307
+ - `docs/authoring-v2.md` update: full-accumulated-list step prompt template (REQUIRED -- feature is dead without this)
308
+ - Consumer cross-check convention documented
309
+
310
+ **Phase 2 (data-gated):**
311
+ - Inject `SessionSummaryProvider` into `AdvanceCorePorts` for concurrent session detection
312
+ - Add 'low' confidence tier when concurrent same-branch sessions detected
313
+ - Gate on: same-branch concurrency rate > 5% OR measured production misattribution incidents
314
+
315
+ **Phase 3 (if compliance measurement shows problem):**
316
+ - `commit_sha_appended` append-style event type for clean SHA accumulation
317
+ - Gate on: full-list compliance rate < 50% for multi-commit sessions
318
+
319
+ **What cannot ship regardless of phase:** `durationSeconds` -- blocked on event envelope timestamps, which require a separate infrastructure change.
@@ -0,0 +1,227 @@
1
+ # Session Metrics -- Design Candidates
2
+
3
+ *Raw investigative material for main agent synthesis. Not a final decision.*
4
+
5
+ ---
6
+
7
+ ## Problem Understanding
8
+
9
+ **Tensions (real tradeoffs):**
10
+
11
+ 1. **Type safety vs. ergonomics.** Typed artifact contracts enforce schema at runtime (valuable for machine-queryable data) but require `outputContract` declarations on specific workflow steps. `context_set` is untyped but universally writeable at any step with no workflow changes.
12
+
13
+ 2. **Session-level vs. step-level.** Metrics (PR numbers, session outcome, files changed) are session-level outcomes. Artifacts are node-scoped by design. `context_set` is run-scoped (correct match). This is a real architectural mismatch for Candidate B.
14
+
15
+ 3. **Agent self-reporting reliability vs. external observation.** Git stats (LOC, files changed) computed by the agent are advisory only. Capturing the HEAD SHA at start and end and running `git diff --stat` post-hoc gives authoritative numbers -- but requires new lifecycle infrastructure.
16
+
17
+ 4. **Convention vs. enforcement.** A convention requires agents to follow it voluntarily. Without a schema or validation signal, agents will be inconsistent. Enforcement (artifact contract validation, auto-injected step prompts) adds ceremony but ensures consistency.
18
+
19
+ **What makes this hard:**
20
+
21
+ - `mergeContext` uses SHALLOW merge semantics -- nested object values are REPLACED, not merged. Using `context.metrics = {...}` as the carrier is a footgun: agents who set partial updates at different steps will silently lose earlier keys. Must use FLAT top-level keys instead.
22
+ - The `observation_recorded` event has a CLOSED discriminated union key set. Adding new keys is not a data change -- it requires updating the TypeScript schema union, all exhaustive handlers, and documentation.
23
+ - No event timestamps in the event envelope -- session duration is fundamentally uncomputable from the event log today, regardless of what metrics mechanism is chosen.
24
+ - The `context_set` projection (`projectRunContextV2` and `SessionIndex.runContextByRunId`) uses LATEST-WINS semantics per run. Each agent delta completely replaces the previous context snapshot. This is fine for flat top-level keys (each key is independently addressable) but breaks nested metrics accumulation.
25
+
26
+ **Likely seam:** The projection layer (`src/v2/projections/`) and the console session summary DTO (`src/v2/usecases/console-types.ts`). The capture boundary can stay unchanged; the gap is extraction and display.
27
+
28
+ ---
29
+
30
+ ## Philosophy Constraints
31
+
32
+ **Principles that matter most here:**
33
+
34
+ - **Prefer explicit domain types over primitives** -- `context_set` carries `JsonValue`, which is as primitive as it gets. A typed projection output (`SessionMetricsV2` interface) can add type discipline at the extraction boundary without changing the storage format.
35
+ - **Validate at boundaries, trust inside** -- the `projectSessionMetricsV2` projection should validate and coerce context keys into typed output at the projection boundary.
36
+ - **Exhaustiveness everywhere** -- any new event kind or `observation_recorded` key extension requires updating the discriminated union. This is a real cost, not just bureaucracy.
37
+ - **YAGNI with discipline** -- token cost metrics are not achievable today; do not design for them. Event timestamps are infrastructure worth doing but orthogonal to this proposal.
38
+ - **Architectural fixes over patches** -- auto-injected final step is a patch. The right architectural answer is to define a queryable projection over existing data.
39
+
40
+ **Philosophy conflicts:**
41
+
42
+ - Using `context_set` as a metrics carrier conflicts with "prefer explicit domain types." The conflict is manageable if the projection output is typed.
43
+ - The `wr.contracts.metrics` approach honors "prefer explicit domain types" but conflicts with YAGNI (high ceremony for advisory data) and the session-vs-node scope mismatch makes it architecturally wrong.
44
+
45
+ ---
46
+
47
+ ## Impact Surface
48
+
49
+ **Files that must stay consistent if we change the projection layer:**
50
+
51
+ - `src/v2/projections/session-metrics.ts` (new) -- must consume `SortedEventLog` like other projections
52
+ - `src/v2/usecases/console-service.ts` -- calls `projectSessionMetricsV2`, populates `ConsoleSessionSummary`
53
+ - `src/v2/usecases/console-types.ts` -- extends `ConsoleSessionSummary` with metrics field
54
+ - `console/src/api/types.ts` -- mirrors `ConsoleSessionSummary`; must be kept in sync
55
+ - `docs/authoring-v2.md` -- convention must be documented for workflow authors
56
+ - `src/v2/infra/local/session-summary-provider/index.ts` -- may need to call the new projection
57
+
58
+ **Contracts that must remain consistent:**
59
+
60
+ - `ConsoleSessionSummary` is a published DTO consumed by the console frontend; any new field must be backward-compatible (optional or null-safe).
61
+ - `SortedEventLog` is the canonical input type for projections; new projection must accept it.
62
+
63
+ ---
64
+
65
+ ## Candidates
66
+
67
+ ### Candidate A: `context_set` convention with flat metrics keys (simplest sufficient)
68
+
69
+ **Summary:** Define a documented `metrics_*` key convention at the top level of `context_set` events. Implement `projectSessionMetricsV2` to extract these keys. Add `metrics: SessionMetricsV2 | null` to `ConsoleSessionSummary`.
70
+
71
+ **Tension resolution:**
72
+ - Resolves: minimal author burden, backward-compatible, queryable with new projection, no schema change
73
+ - Accepts: weak type safety (JsonValue storage), agent reliability (no enforcement), agents must self-compute git stats
74
+
75
+ **Boundary solved at:** Projection layer -- the existing `context_set` mechanism is unchanged; only the extraction and display layers are new.
76
+
77
+ **Why this boundary is best-fit:** The `isAutonomous` and `parentSessionId` precedents show that `context_set` is already the approved mechanism for session-level advisory metadata. Adding a `metrics_*` convention follows the exact same pattern. This is "validate at boundaries" -- the projection validates/coerces the advisory data into a typed interface.
78
+
79
+ **Specific convention:**
80
+ - `metrics_outcome: 'success' | 'partial' | 'abandoned' | 'error'`
81
+ - `metrics_pr_numbers: number[]`
82
+ - `metrics_files_changed: number` (advisory)
83
+ - `metrics_lines_added: number` (advisory)
84
+ - `metrics_lines_removed: number` (advisory)
85
+ - `metrics_git_head_end: string` (advisory HEAD SHA at end; superseded by Phase 2)
86
+
87
+ **Failure mode:** Agents use inconsistent key names or forget to report. Console shows null metrics for most sessions. Enforcement can be added incrementally (prompted via workflow step guidance).
88
+
89
+ **Repo-pattern relationship:** Follows `projectRunContextV2` and `ConsoleSessionSummary.isAutonomous` exactly.
90
+
91
+ **Gains:** Zero schema change, works for any workflow today, reversible (just remove the projection and console field).
92
+
93
+ **Losses:** Weak type safety, no enforcement, agent-computed git stats are advisory.
94
+
95
+ **Scope:** Best-fit for Phase 1.
96
+
97
+ **Philosophy fit:** Honors YAGNI. Conflicts with "prefer explicit domain types" (JsonValue storage, mitigated by typed projection output).
98
+
99
+ ---
100
+
101
+ ### Candidate B: `wr.contracts.metrics` artifact contract (typed, validated)
102
+
103
+ **Summary:** Define a new Zod-validated artifact contract `wr.contracts.metrics`, declared on a workflow's final/summary step `outputContract`. Machine-verifiable at `continue_workflow` boundary.
104
+
105
+ **Tension resolution:**
106
+ - Resolves: type safety, validation at boundary, machine-queryable via existing artifact projection
107
+ - Accepts: author burden (outputContract required on final step), node-scoped (architectural mismatch for session-level metrics), not universal for existing workflows
108
+
109
+ **Boundary solved at:** The artifact contract system (`src/v2/durable-core/schemas/artifacts/`).
110
+
111
+ **Why this boundary is WRONG for this problem:** Artifacts are node-scoped (`node_output_appended` with `scope: {runId, nodeId}`). Session-level metrics don't belong to any single node. The closest existing analogs (`wr.contracts.review_verdict`, `wr.contracts.assessment`) are also node-scoped but are semantically correct there (they assess what a node did). Metrics are session-level: they describe the session outcome, not what a node did.
112
+
113
+ **Failure mode:** Most workflows don't have a dedicated final/summary step. All existing workflows would need to be updated to add a final step with `outputContract: wr.contracts.metrics`. This is a high-friction migration.
114
+
115
+ **Repo-pattern relationship:** Adapts the existing artifact contract pattern (correct for node-level structured outputs).
116
+
117
+ **Gains:** Type safety, enforcement at boundary, reuses existing validation infrastructure.
118
+
119
+ **Losses:** High author burden, wrong granularity (node vs. session), not backward-compatible.
120
+
121
+ **Scope:** Too broad for Phase 1. Potentially valid as a Phase 2 opt-in (not mandatory) for workflows that want formal metrics reporting.
122
+
123
+ **Philosophy fit:** Honors "prefer explicit domain types", "validate at boundaries." Conflicts with YAGNI and "minimal author burden."
124
+
125
+ ---
126
+
127
+ ### Candidate C: End-of-session HEAD SHA observation + post-hoc diff (reframe)
128
+
129
+ **Summary:** Emit an additional `observation_recorded` event with key `git_head_sha_end` when the session reaches `complete` state. Add a console endpoint `GET /api/v2/sessions/:id/diff-summary` that runs `git diff --stat <start_sha>..<end_sha>` for authoritative LOC and files-changed.
130
+
131
+ **Tension resolution:**
132
+ - Resolves: agent reliability (git stats from git, not agent), no false precision, zero author burden for git metrics
133
+ - Accepts: schema change (closed enum extension), requires git access from WorkRail at query time, complex lifecycle hook at session completion
134
+
135
+ **Boundary solved at:** Two boundaries -- (1) session lifecycle (end-of-session observation emission), (2) console query layer (new endpoint).
136
+
137
+ **Why this boundary is right for Phase 2 but wrong for Phase 1:** Authoritative git stats are more valuable than advisory agent-reported stats. But extending the `observation_recorded` closed enum requires a schema union update, and WorkRail currently has no git access at the MCP handler level. This is a significant but well-bounded engineering investment, appropriate as a follow-up after Phase 1 validates demand.
138
+
139
+ **Specific shape:**
140
+ - New `observation_recorded` key: `git_head_sha_end` (requires extending `DomainEventV1Schema`)
141
+ - Emission hook: when `advanceAndRecord` produces `outcome.kind = 'advanced'` and the next snapshot state is `complete`, emit the end observation
142
+ - Console endpoint: reads start SHA (first `observation_recorded.git_head_sha`) + end SHA (last `observation_recorded.git_head_sha_end`), runs `git diff --stat start..end`, returns JSON
143
+
144
+ **Failure mode:** Extending the closed observation_recorded enum requires updating all exhaustive handlers. WorkRail has no git access today -- the console server would need to run the diff (it has access to the repo path via `repo_root` observation). Git history could be incomplete if the agent rebased between start and end.
145
+
146
+ **Repo-pattern relationship:** Extends the existing `observation_recorded` start-of-session pattern.
147
+
148
+ **Gains:** Authoritative git metrics, zero agent participation.
149
+
150
+ **Losses:** Schema blast radius (closed enum extension), new git access boundary, complex lifecycle hook.
151
+
152
+ **Scope:** Best-fit for Phase 2 (separate GitHub issue).
153
+
154
+ **Philosophy fit:** Honors "architectural fixes over patches", "determinism." Conflicts with YAGNI (Phase 1 perspective).
155
+
156
+ ---
157
+
158
+ ### Candidate D: Add `timestampMs` to event envelope (foundational duration infrastructure)
159
+
160
+ **Summary:** Add optional `timestampMs: number` to `DomainEventEnvelopeV1Schema`. Session duration = `last_event.timestampMs - first_event.timestampMs`.
161
+
162
+ **Tension resolution:**
163
+ - Resolves: session duration computable from event log without agent participation
164
+ - Accepts: significant schema change, not backward-compatible for existing sessions, adds size to every event
165
+
166
+ **Boundary solved at:** `DomainEventEnvelopeV1Schema` -- the lowest-level shared schema in the event log.
167
+
168
+ **Why this boundary is too broad:** This is foundational infrastructure that happens to enable one metric (duration). The blast radius affects every event builder, the session index, all parsers, and the design locks doc. This is a separate engineering investment that should be proposed and decided independently.
169
+
170
+ **Failure mode:** Existing sessions parse fine (optional field) but show null duration. Requires clock injection into all event builders. The design locks doc (`docs/design/v2-core-design-locks.md`) would need a new lock entry.
171
+
172
+ **Repo-pattern relationship:** Departs from existing pattern (events have no timestamps today).
173
+
174
+ **Gains:** Duration metrics without agent participation; future event-timing analytics.
175
+
176
+ **Losses:** Significant blast radius, orthogonal scope from "metrics capture."
177
+
178
+ **Scope:** Too broad for this proposal. Separate proposal: `feat(engine): add event envelope timestamps`.
179
+
180
+ ---
181
+
182
+ ## Comparison and Recommendation
183
+
184
+ **Primary recommendation: Candidate A (flat keys) for Phase 1. Candidate C for Phase 2.**
185
+
186
+ | Criterion | A (context_set) | B (artifact) | C (end SHA) | D (timestamps) |
187
+ |-----------|----------------|--------------|-------------|----------------|
188
+ | Backward compatible | ✅ | ❌ | ⚠️ | ⚠️ |
189
+ | Author burden | minimal | high | zero | zero |
190
+ | Type safety | projection-level | strong | git-authoritative | N/A |
191
+ | Scope | best-fit (P1) | too broad | best-fit (P2) | separate |
192
+ | Schema change | none | new files only | enum extension | all events |
193
+
194
+ **Candidate B loses:** Session-vs-node scope mismatch is architecturally wrong. High ceremony for advisory data.
195
+
196
+ **Candidate C is Phase 2:** Right solution for authoritative git stats, wrong phase (too complex for immediate need).
197
+
198
+ **Candidate D is a separate proposal:** Foundational infrastructure, orthogonal to metrics capture.
199
+
200
+ ---
201
+
202
+ ## Self-Critique
203
+
204
+ **Strongest counter-argument against Candidate A:**
205
+
206
+ Flat `metrics_*` keys pollute the global context namespace. A workflow author using `context.metrics_outcome` for their own purpose would silently conflict with the convention. The counter: the `metrics_` prefix is a documented namespace convention; conflicts are the author's error, not a framework deficiency.
207
+
208
+ **What would tip toward Candidate B:**
209
+
210
+ If the use case requires formal audit-trail compliance (e.g., billing based on LOC metrics), advisory `context_set` data is insufficient. Typed artifact contracts with validation would be necessary. No such requirement exists currently.
211
+
212
+ **What would tip toward doing nothing:**
213
+
214
+ If inspection of real session event logs reveals that agents ALREADY consistently self-report outcomes in step notes (plain markdown), the right solution is better console extraction of markdown content, not new capture infrastructure. This should be validated before Phase 1 implementation.
215
+
216
+ **Pivoting away from flat keys:**
217
+
218
+ If the context budget (`MAX_CONTEXT_BYTES = 256KB`) becomes a constraint (unlikely), or if namespace pollution proves problematic, a Phase 1.5 could introduce a lightweight `context.metrics` object with explicit full-replacement semantics (agents always send the complete object).
219
+
220
+ ---
221
+
222
+ ## Open Questions for Main Agent
223
+
224
+ 1. Are agents already self-reporting outcomes in step notes? (Validate primary framing risk before implementing.)
225
+ 2. Should Phase 1 include any enforcement signal (e.g., a final step prompt that asks for `metrics_*` keys), or is the convention sufficient initially?
226
+ 3. Should `metrics_git_head_end` (advisory) be included in Phase 1, or deferred to Phase 2?
227
+ 4. Is the context namespace pollution risk (flat `metrics_*` keys) acceptable, or should the convention use a namespacing strategy?
@@ -0,0 +1,104 @@
1
+ # Session Metrics -- Design Review Findings
2
+
3
+ *Review findings for the selected direction: Candidate A (flat context_set keys + projectSessionMetricsV2 projection)*
4
+
5
+ ---
6
+
7
+ ## Tradeoff Review
8
+
9
+ | Tradeoff | Assessment | Condition for Reversal |
10
+ |----------|-----------|----------------------|
11
+ | JsonValue storage, typed projection output | Acceptable. Consistent with isAutonomous/parentSessionId precedent. | If metrics_outcome drives workflow branching → migrate to typed artifact contract |
12
+ | No enforcement, relies on step prompting | Acceptable IF prompting guidance is included. Without it, feature is functionally dead. | N/A -- enforcement is a Phase 2 enhancement |
13
+ | Advisory git stats (LOC, files) | Acceptable with clear "agent-reported" UI labeling. | Phase 2 (Candidate C) removes this tradeoff |
14
+ | Flat key namespace (metrics_ prefix) | Acceptable. mergeContext semantics are design-locked at §18.2. | If 20+ metrics keys make context debugging painful → introduce sub-object with explicit full-replacement semantics |
15
+
16
+ ---
17
+
18
+ ## Failure Mode Review
19
+
20
+ | Failure Mode | Covered | Missing Mitigation | Risk |
21
+ |-------------|---------|-------------------|------|
22
+ | Convention adoption near-zero without step prompting | Partially -- docs required, but template not specified | Concrete reusable step prompt template in docs/authoring-v2.md | HIGH |
23
+ | Inconsistent key names (metrics_pr_number vs metrics_pr_numbers) | Silently returns null (acceptable) | None required for Phase 1 | LOW |
24
+ | Nested metrics object shallow-merge footgun | NOT covered by design | Explicit WARNING in authoring docs against using nested context.metrics = {...} | MEDIUM |
25
+ | Malformed projection values | Requires defensive coercion implementation | Projection must coerce/null, not throw, on malformed types | MEDIUM |
26
+
27
+ ---
28
+
29
+ ## Runner-Up / Simpler Alternative Review
30
+
31
+ **From runner-up (Candidate C):** `metrics_git_head_end` should be included in Phase 1 convention as an advisory key with explicit Phase 2 upgrade path (same key, automatic source in Phase 2). No design change needed -- this is additive.
32
+
33
+ **Simpler variant:** `metrics_outcome` only (single key, closed enum). Valid as a scope reduction if Phase 1 must ship quickly. Full spec is already scope-appropriate.
34
+
35
+ **Hybrid:** `metrics_outcome` as the primary/required recommendation; all others optional. Already how the design is structured.
36
+
37
+ **Conclusion:** No material design changes from comparison. Design is well-scoped.
38
+
39
+ ---
40
+
41
+ ## Philosophy Alignment
42
+
43
+ | Principle | Status |
44
+ |-----------|--------|
45
+ | YAGNI | ✅ Satisfied |
46
+ | Immutability | ✅ Satisfied |
47
+ | Validate at boundaries | ✅ Satisfied (projection coercion) |
48
+ | Functional/declarative | ✅ Satisfied |
49
+ | Determinism | ✅ Satisfied |
50
+ | Prefer explicit domain types | ⚠️ Tension -- JsonValue storage (acceptable, consistent with existing precedent) |
51
+ | Type safety as first line of defense | ⚠️ Tension -- no compile-time guarantee on agent output (acceptable for advisory data) |
52
+ | Make illegal states unrepresentable | ⚠️ Tension -- malformed values coerced to null (acceptable for advisory data) |
53
+ | Architectural fixes over patches | ✅ Satisfied -- projection layer, not a special case |
54
+
55
+ ---
56
+
57
+ ## Findings
58
+
59
+ ### RED (Blocking or Critical)
60
+
61
+ None identified. Design does not violate any hard invariants.
62
+
63
+ ### ORANGE (Important, Must Address Before Ship)
64
+
65
+ **O1: Convention adoption requires concrete step prompt template**
66
+ The design as specified requires a step prompt template in `docs/authoring-v2.md`. Without a copy-paste ready prompt that workflow authors can add to their final step, adoption will be near-zero and the feature will appear broken. This is a REQUIRED deliverable alongside the code changes.
67
+
68
+ **O2: Explicit warning against nested metrics objects**
69
+ The shallow merge footgun (`context.metrics = {...}` gets fully replaced on partial updates) must be explicitly warned against in the authoring docs. Without this, workflow authors who follow intuition will silently lose metric data.
70
+
71
+ ### YELLOW (Advisories, Can Be Addressed in Follow-Up)
72
+
73
+ **Y1: Defensive projection coercion must be implemented**
74
+ The `projectSessionMetricsV2` projection must coerce malformed values to null rather than propagating or throwing. E.g., if `metrics_pr_numbers` is the string `"123"` instead of `[123]`, return `null` for that field. Implement this defensively at the projection boundary.
75
+
76
+ **Y2: Console UI must label advisory metrics**
77
+ The console display must show "agent-reported" or equivalent label for all Phase 1 metrics to avoid false precision. Do not present advisory LOC counts as authoritative. This is a UX requirement, not a code requirement.
78
+
79
+ **Y3: Document escalation trigger for type safety**
80
+ If `metrics_outcome` is ever used to drive automated workflow behavior (conditional branching, auto-dispatch), the type-safety tension becomes critical. Document this escalation condition in the design notes so future feature work knows when to migrate to a typed artifact contract.
81
+
82
+ ---
83
+
84
+ ## Recommended Revisions
85
+
86
+ 1. **Add to Phase 1 scope:** A reusable final-step prompt template in `docs/authoring-v2.md` -- exact wording for a "report outcomes" step that instructs the agent to set `metrics_outcome`, `metrics_pr_numbers`, and other relevant keys. (O1)
87
+
88
+ 2. **Add to authoring docs:** A warning section "Do not use context.metrics = {...}" explaining the shallow merge footgun and why flat `metrics_*` keys are required. (O2)
89
+
90
+ 3. **Add to projection implementation spec:** Defensive coercion: all metrics fields return `null` if absent or malformed; projection never throws on bad metric values. (Y1)
91
+
92
+ 4. **Add to console UI spec:** Advisory labeling for all Phase 1 metrics values. "Agent-reported" tag or equivalent. (Y2)
93
+
94
+ 5. **Include `metrics_git_head_end` in Phase 1 convention** with documentation that this is advisory and will be superseded by automatic capture in Phase 2. (from runner-up borrowing)
95
+
96
+ ---
97
+
98
+ ## Residual Concerns
99
+
100
+ 1. **Primary framing risk unverified:** We have not verified whether agents are already self-reporting outcomes informally in step notes. If they are, Phase 1 may duplicate effort or conflict with existing informal conventions. **Recommendation:** Before shipping, sample 10-20 recent session event logs and check whether any `context_set` events already contain outcome-like data.
101
+
102
+ 2. **`metrics_git_head_end` reliability:** Agents may forget to run `git rev-parse HEAD` or may report the wrong commit (e.g., the HEAD before their final commit). Phase 1 should document this as advisory with low confidence, and Phase 2 should be prioritized to make it authoritative.
103
+
104
+ 3. **Console projection callsite:** `console-service.ts` calls `projectRunContextV2` today; adding a call to `projectSessionMetricsV2` is additive but adds latency to session summary loading. If metrics projection is expensive (it shouldn't be for a simple key-read), it should be lazy-loaded or cached alongside the existing summary cache.