@exaudeus/workrail 3.67.0 → 3.68.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/application/services/compiler/template-registry.js +10 -1
- package/dist/cli/commands/worktrain-init.js +1 -1
- package/dist/console-ui/assets/{index-tOl8Vowf.js → index-DPdRJHMX.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/modes/full-pipeline.js +4 -4
- package/dist/coordinators/modes/implement-shared.js +5 -5
- package/dist/coordinators/modes/implement.js +4 -4
- package/dist/coordinators/pr-review.js +4 -4
- package/dist/daemon/workflow-runner.d.ts +1 -0
- package/dist/daemon/workflow-runner.js +1 -0
- package/dist/manifest.json +31 -31
- package/dist/mcp/handlers/v2-context-budget.js +18 -0
- package/dist/mcp/handlers/v2-workflow.js +1 -1
- package/dist/mcp/workflow-protocol-contracts.js +2 -2
- package/dist/v2/durable-core/constants.d.ts +2 -0
- package/dist/v2/durable-core/constants.js +2 -1
- package/dist/v2/projections/session-metrics.js +1 -1
- package/docs/authoring-v2.md +4 -4
- package/docs/changelog-recent.md +3 -3
- package/docs/configuration.md +1 -1
- package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
- package/docs/design/adaptive-coordinator-context.md +1 -1
- package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
- package/docs/design/adaptive-coordinator-routing-review.md +1 -1
- package/docs/design/adaptive-coordinator-routing.md +34 -34
- package/docs/design/agent-cascade-protocol.md +2 -2
- package/docs/design/console-daemon-separation-discovery.md +323 -0
- package/docs/design/context-assembly-design-candidates.md +1 -1
- package/docs/design/context-assembly-implementation-plan.md +1 -1
- package/docs/design/context-assembly-layer.md +2 -2
- package/docs/design/context-assembly-review-findings.md +1 -1
- package/docs/design/coordinator-access-audit.md +293 -0
- package/docs/design/coordinator-architecture-audit.md +62 -0
- package/docs/design/coordinator-error-handling-audit.md +240 -0
- package/docs/design/coordinator-testability-audit.md +426 -0
- package/docs/design/daemon-architecture-discovery.md +1 -1
- package/docs/design/daemon-console-separation-discovery.md +242 -0
- package/docs/design/daemon-memory-audit.md +203 -0
- package/docs/design/design-candidates-console-daemon-separation.md +256 -0
- package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
- package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
- package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
- package/docs/design/discovery-loop-fix-candidates.md +161 -0
- package/docs/design/discovery-loop-fix-design-review.md +106 -0
- package/docs/design/discovery-loop-fix-validation.md +258 -0
- package/docs/design/discovery-loop-investigation-A.md +188 -0
- package/docs/design/discovery-loop-investigation-B.md +287 -0
- package/docs/design/exploration-workflow-candidates.md +205 -0
- package/docs/design/exploration-workflow-design-review.md +166 -0
- package/docs/design/exploration-workflow-discovery.md +443 -0
- package/docs/design/ide-context-files-candidates.md +231 -0
- package/docs/design/ide-context-files-design-review.md +85 -0
- package/docs/design/ide-context-files.md +615 -0
- package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
- package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
- package/docs/design/in-process-http-audit.md +190 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
- package/docs/design/loadSessionNotes-candidates.md +108 -0
- package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
- package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
- package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
- package/docs/design/probe-session-design-candidates.md +261 -0
- package/docs/design/probe-session-phase0.md +490 -0
- package/docs/design/routines-guide.md +7 -7
- package/docs/design/session-metrics-attribution-candidates.md +250 -0
- package/docs/design/session-metrics-attribution-design-review.md +115 -0
- package/docs/design/session-metrics-attribution-discovery.md +319 -0
- package/docs/design/session-metrics-candidates.md +227 -0
- package/docs/design/session-metrics-design-review.md +104 -0
- package/docs/design/session-metrics-discovery.md +454 -0
- package/docs/design/spawn-session-debug.md +202 -0
- package/docs/design/trigger-validator-candidates.md +214 -0
- package/docs/design/trigger-validator-review.md +109 -0
- package/docs/design/trigger-validator-shaping-phase0.md +239 -0
- package/docs/design/trigger-validator.md +454 -0
- package/docs/design/v2-core-design-locks.md +2 -2
- package/docs/design/workflow-extension-points.md +15 -15
- package/docs/design/workflow-id-validation-at-startup.md +1 -1
- package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
- package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
- package/docs/design/worktrain-task-queue-candidates.md +5 -5
- package/docs/design/worktrain-task-queue.md +4 -4
- package/docs/discovery/coordinator-script-design.md +1 -1
- package/docs/discovery/coordinator-ux-discovery.md +3 -3
- package/docs/discovery/simulation-report.md +1 -1
- package/docs/discovery/workflow-modernization-discovery.md +326 -0
- package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
- package/docs/discovery/worktrain-status-briefing.md +1 -1
- package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
- package/docs/docker.md +1 -1
- package/docs/ideas/backlog.md +227 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
- package/docs/integrations/claude-code.md +5 -5
- package/docs/integrations/firebender.md +1 -1
- package/docs/plans/agentic-orchestration-roadmap.md +2 -2
- package/docs/plans/mr-review-workflow-redesign.md +9 -9
- package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
- package/docs/plans/ui-ux-workflow-discovery.md +2 -2
- package/docs/plans/workflow-categories-candidates.md +8 -8
- package/docs/plans/workflow-categories-discovery.md +4 -4
- package/docs/plans/workflow-modernization-design.md +430 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
- package/docs/plans/workflow-staleness-detection-review.md +4 -4
- package/docs/plans/workflow-staleness-detection.md +9 -9
- package/docs/plans/workrail-platform-vision.md +3 -3
- package/docs/reference/agent-context-cleaner-snippet.md +1 -1
- package/docs/reference/agent-context-guidance.md +4 -4
- package/docs/reference/context-optimization.md +2 -2
- package/docs/roadmap/now-next-later.md +2 -2
- package/docs/roadmap/open-work-inventory.md +16 -16
- package/docs/workflows.md +31 -31
- package/package.json +1 -1
- package/spec/workflow-tags.json +47 -47
- package/workflows/adaptive-ticket-creation.json +16 -16
- package/workflows/architecture-scalability-audit.json +22 -22
- package/workflows/bug-investigation.agentic.v2.json +3 -3
- package/workflows/classify-task-workflow.json +1 -1
- package/workflows/coding-task-workflow-agentic.json +6 -6
- package/workflows/cross-platform-code-conversion.v2.json +8 -8
- package/workflows/document-creation-workflow.json +8 -8
- package/workflows/documentation-update-workflow.json +8 -8
- package/workflows/intelligent-test-case-generation.json +2 -2
- package/workflows/learner-centered-course-workflow.json +2 -2
- package/workflows/mr-review-workflow.agentic.v2.json +4 -4
- package/workflows/personal-learning-materials-creation-branched.json +8 -8
- package/workflows/presentation-creation.json +5 -5
- package/workflows/production-readiness-audit.json +1 -1
- package/workflows/relocation-workflow-us.json +31 -31
- package/workflows/routines/context-gathering.json +1 -1
- package/workflows/routines/design-review.json +1 -1
- package/workflows/routines/execution-simulation.json +1 -1
- package/workflows/routines/feature-implementation.json +3 -3
- package/workflows/routines/final-verification.json +1 -1
- package/workflows/routines/hypothesis-challenge.json +1 -1
- package/workflows/routines/ideation.json +1 -1
- package/workflows/routines/parallel-work-partitioning.json +3 -3
- package/workflows/routines/philosophy-alignment.json +2 -2
- package/workflows/routines/plan-analysis.json +1 -1
- package/workflows/routines/plan-generation.json +1 -1
- package/workflows/routines/tension-driven-design.json +6 -6
- package/workflows/scoped-documentation-workflow.json +26 -26
- package/workflows/ui-ux-design-workflow.json +14 -14
- package/workflows/workflow-diagnose-environment.json +1 -1
- package/workflows/workflow-for-workflows.json +1 -1
|
@@ -0,0 +1,319 @@
|
|
|
1
|
+
# Session Metrics Attribution -- Hybrid Architecture Discovery
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-04-21
|
|
4
|
+
**Status:** Discovery complete -- recommendation ready
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## About This Document
|
|
9
|
+
|
|
10
|
+
Human-readable artifact for sharing findings and recommendations. NOT execution truth -- workflow session notes and context variables are the durable record. Read this for understanding, not for resuming the workflow.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Context / Ask
|
|
15
|
+
|
|
16
|
+
**Stated goal (original):** Design the right hybrid architecture for attributing git metrics (LOC, files changed, commits) to a specific WorkRail session, accounting for concurrent sessions on the same branch.
|
|
17
|
+
|
|
18
|
+
**Goal classification:** `solution_statement` -- prescribes a specific hybrid architecture with confidence levels and agent SHA reporting rather than describing the outcome.
|
|
19
|
+
|
|
20
|
+
**Reframed problem:** Given that git attribution via branch or time-window is unreliable when sessions overlap on the same branch, what is the minimal durable record a WorkRail session needs to carry so consumers can compute accurate git metrics even under adversarial conditions?
|
|
21
|
+
|
|
22
|
+
**Prior discoveries:**
|
|
23
|
+
- Discovery 1: rejected agent self-reporting of LOC/files (unreliable, agents grade own homework)
|
|
24
|
+
- Discovery 2: found `buildSuccessOutcome()` + `extraEventsToAppend` as the hook point for a `run_completed` event; `observation_recorded` keys are locked; no event timestamps today
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Path Recommendation
|
|
29
|
+
|
|
30
|
+
**Selected path:** `design_first`
|
|
31
|
+
|
|
32
|
+
**Rationale:** The goal is already a well-formed solution statement, and the codebase landscape was thoroughly covered in Discoveries 1 and 2. The dominant risk is NOT "we don't know the landscape" -- it's "we're designing the wrong concept or solving a problem that's rarer than we think." `design_first` is correct here because:
|
|
33
|
+
- The specific concurrent same-branch attribution problem has not been validated as common
|
|
34
|
+
- The proposed solution (agent SHA self-reporting) has the same reliability failure mode as what Discovery 1 rejected
|
|
35
|
+
- The accumulation problem for commit SHAs across steps is architecturally non-trivial and hasn't been resolved
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Constraints / Anti-goals
|
|
40
|
+
|
|
41
|
+
**Core constraints:**
|
|
42
|
+
- `observation_recorded` has a CLOSED discriminated union key set -- extending it requires schema + handler updates
|
|
43
|
+
- `mergeContext` uses SHALLOW merge semantics (locked at §18.2) -- arrays/objects are REPLACED, not merged
|
|
44
|
+
- `context_set` projection uses LATEST-WINS semantics per run -- a new `context_set` with `metrics_commit_shas: ["def"]` will silently replace an earlier `["abc"]`
|
|
45
|
+
- No wall-clock timestamps on events today -- `durationSeconds` is blocked on this
|
|
46
|
+
- Token usage is NOT accessible at the MCP layer
|
|
47
|
+
- The `buildSuccessOutcome()` function has access to `lockedIndex: SessionIndex`, which is the current session's index only -- no cross-session queries at that point
|
|
48
|
+
|
|
49
|
+
**Anti-goals:**
|
|
50
|
+
- Do not design for the general case (all possible attribution problems) -- focus on the specific concurrent same-branch case
|
|
51
|
+
- Do not make the high-confidence path depend on agent compliance if there's an engine-side alternative
|
|
52
|
+
- Do not add a new event kind just to hold data that fits in an existing mechanism with minor conventions
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Landscape Findings (Code Investigation)
|
|
57
|
+
|
|
58
|
+
### Where `git_head_sha` is captured at session start
|
|
59
|
+
|
|
60
|
+
`src/mcp/handlers/v2-execution/start.ts` calls `resolveWorkspaceAnchors()` before `buildInitialEvents()`. The result is passed to `buildInitialEvents()` which emits one `observation_recorded` event per observation at the end of the initial event batch. The `git_head_sha` observation uses `type: 'git_sha1'` and regex-validates the 40-char hex format.
|
|
61
|
+
|
|
62
|
+
**Pattern for `run_completed`:** Mirror exactly -- capture end HEAD SHA via `resolveWorkspaceAnchors()` at the completion advance, emit an `observation_recorded` event (re-using the existing mechanism) rather than a new event kind.
|
|
63
|
+
|
|
64
|
+
**CRITICAL FINDING:** `observation_recorded` events are session-scoped (no scope field -- unique among all event kinds). The dedupeKey pattern is `observation_recorded:{sessionId}:{key}`. This means a second `git_head_sha` observation would collide with the first (same dedupeKey). To capture end HEAD SHA, either:
|
|
65
|
+
(a) Use a different key (e.g., `git_head_sha_end`) -- requires schema extension
|
|
66
|
+
(b) Emit the new `run_completed` event kind with the end SHA embedded in its data
|
|
67
|
+
(c) Change the dedupeKey to include the runId so start and end can coexist
|
|
68
|
+
|
|
69
|
+
Option (b) is cleanest: a `run_completed` event with `endGitSha` in its data avoids the dedupeKey collision and clearly expresses the semantic (session finished at this SHA).
|
|
70
|
+
|
|
71
|
+
### What the `SessionIndex` provides at completion time
|
|
72
|
+
|
|
73
|
+
`buildSessionIndex()` in `src/v2/durable-core/session-index.ts` builds a single-session index from the current session's sorted event log. It provides:
|
|
74
|
+
- `runStartedByRunId`: workflow identity per run
|
|
75
|
+
- `runContextByRunId`: latest context per run
|
|
76
|
+
- `sortedEvents`: the full sorted event log
|
|
77
|
+
|
|
78
|
+
It does NOT provide cross-session data. At the point where `buildSuccessOutcome()` runs, only the current session's truth is loaded.
|
|
79
|
+
|
|
80
|
+
**Implication for concurrent session detection:** Detecting other active sessions on the same branch at completion time would require calling `LocalSessionSummaryProviderV2.loadHealthySummaries()` which is:
|
|
81
|
+
1. An async, I/O-heavy operation (scans up to 200 sessions from disk)
|
|
82
|
+
2. Not available as a port in `AdvanceCorePorts` today
|
|
83
|
+
3. Sequential by design (to avoid hammering the file system)
|
|
84
|
+
|
|
85
|
+
This operation CANNOT be done synchronously at completion time without architectural changes to make it injectable. It would add significant latency to every final advance.
|
|
86
|
+
|
|
87
|
+
### The `context_set` accumulation problem -- concrete analysis
|
|
88
|
+
|
|
89
|
+
`mergeContext` contract (locked at §18.2): arrays are REPLACED, not merged.
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
step 5: context_set { metrics_commit_shas: ["abc123"] }
|
|
93
|
+
step 9: context_set { metrics_commit_shas: ["def456"] }
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Result: `metrics_commit_shas` = `["def456"]` -- "abc123" is permanently lost.
|
|
97
|
+
|
|
98
|
+
Fix options:
|
|
99
|
+
(a) Agent always sends full accumulated list: `["abc123", "def456"]` -- relies on agent memory, fragile
|
|
100
|
+
(b) New append-style event type: architecturally sound but requires new event kind, schema change, new projection
|
|
101
|
+
(c) Agent accumulates in a single context key by always including previous SHAs -- workable if the step prompt explicitly instructs this pattern
|
|
102
|
+
|
|
103
|
+
Option (c) is workable but brittle. Option (b) is architecturally correct but has ceremony cost.
|
|
104
|
+
|
|
105
|
+
### What `run_completed` can and cannot contain (given current infrastructure)
|
|
106
|
+
|
|
107
|
+
**Can contain (authoritative, engine-computed):**
|
|
108
|
+
- `endGitSha` -- available at advance time via `resolveWorkspaceAnchors()`
|
|
109
|
+
- `runId` -- already in scope
|
|
110
|
+
- `captureConfidence` -- partially (see below)
|
|
111
|
+
|
|
112
|
+
**Cannot contain (blocked on infrastructure):**
|
|
113
|
+
- `durationSeconds` -- no event timestamps today; the session start time is not recorded anywhere in the event log
|
|
114
|
+
- Concurrent session detection -- cross-session queries not available at completion time
|
|
115
|
+
|
|
116
|
+
**Confidence level determination at completion time:**
|
|
117
|
+
The only confidence test available at completion time (without new infrastructure) is:
|
|
118
|
+
- Did the agent report `metrics_commit_shas`? -- check `lockedIndex.runContextByRunId`
|
|
119
|
+
- Is there a git_head_sha recorded? -- check sortedEvents for `observation_recorded` with key `git_head_sha`
|
|
120
|
+
|
|
121
|
+
Testing for concurrent sessions (to set `low` confidence) requires the session summary provider, which is not injectable at the advance handler level today.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Design Candidates
|
|
126
|
+
|
|
127
|
+
### Candidate A: Minimal `run_completed` event (Phase 1, shippable now)
|
|
128
|
+
|
|
129
|
+
Emit a `run_completed` event in `buildSuccessOutcome()` when `newEngineState.kind === 'complete'`.
|
|
130
|
+
|
|
131
|
+
**Event schema:**
|
|
132
|
+
```typescript
|
|
133
|
+
{
|
|
134
|
+
kind: 'run_completed',
|
|
135
|
+
scope: { runId: string },
|
|
136
|
+
data: {
|
|
137
|
+
endGitSha: string | null, // engine-authoritative (from resolveWorkspaceAnchors)
|
|
138
|
+
startGitSha: string | null, // extracted from observation_recorded events in lockedIndex
|
|
139
|
+
agentCommitShas: readonly string[], // copied from context metrics_commit_shas if present (may be empty)
|
|
140
|
+
captureConfidence: 'high' | 'medium' | 'none',
|
|
141
|
+
// 'high' = agentCommitShas non-empty
|
|
142
|
+
// 'medium' = start+end SHA available (branch diff usable, concurrent risk unknown)
|
|
143
|
+
// 'none' = no git context
|
|
144
|
+
}
|
|
145
|
+
}
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
**Confidence logic (no cross-session query):**
|
|
149
|
+
```
|
|
150
|
+
if agentCommitShas.length > 0 → 'high'
|
|
151
|
+
else if startGitSha && endGitSha → 'medium'
|
|
152
|
+
else → 'none'
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
Note: 'low' (concurrent same-branch sessions detected) is NOT computable at advance time without new infrastructure. Drop it from Phase 1.
|
|
156
|
+
|
|
157
|
+
**Accumulation problem:** Agent must send full accumulated list in each context_set. Step prompt must explicitly say "include all commit SHAs you have made so far" not "include the commit SHAs from this step."
|
|
158
|
+
|
|
159
|
+
**Schema change cost:**
|
|
160
|
+
- New event kind `run_completed` in `DomainEventV1Schema` discriminated union
|
|
161
|
+
- New Zod schema for data
|
|
162
|
+
- Update all exhaustive handlers (grep for exhaustive switches on DomainEventV1)
|
|
163
|
+
- New projection `projectRunCompletedV2` to surface this event to consumers
|
|
164
|
+
|
|
165
|
+
**What ships:** A `run_completed` event in the session log when a workflow completes. Consumers can compute LOC from `git diff --stat startGitSha..endGitSha`. When agent reported SHAs, use those for precise per-session attribution. When not, use the full diff with a 'medium' confidence label.
|
|
166
|
+
|
|
167
|
+
**What does NOT ship:** Duration, concurrent session detection, 'low' confidence tier.
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
### Candidate B: `run_completed` deferred -- use context_set convention only (skip Phase 2 entirely)
|
|
172
|
+
|
|
173
|
+
The Phase 1 design (Discoveries 1+2) already recommended `metrics_commit_shas` as a flat context key. The question "when does the accumulation problem matter?" deserves scrutiny:
|
|
174
|
+
|
|
175
|
+
Concurrent same-branch sessions are ONLY a problem when:
|
|
176
|
+
1. Two sessions are running on the SAME branch (not just concurrently in time)
|
|
177
|
+
2. Both sessions are making commits
|
|
178
|
+
|
|
179
|
+
In a healthy WorkRail usage pattern (worktrees, feature branches per session), concurrent sessions use different branches. The same-branch concurrency problem is a degraded operating mode, not the common case.
|
|
180
|
+
|
|
181
|
+
**Recommendation:** If same-branch concurrency is rare (< 5% of sessions based on actual usage data), a `run_completed` event adds schema complexity, maintenance cost, and exhaustive-switch update burden for marginal attribution improvement. The right move is:
|
|
182
|
+
1. Require actual usage data before building Phase 2 infrastructure
|
|
183
|
+
2. Phase 1 (`metrics_commit_shas` flat context key) already provides the high-confidence path when agents comply
|
|
184
|
+
3. Consumers can use start/end SHAs from existing `observation_recorded` events for the 'medium' confidence case -- start SHA already exists; end SHA can be added via the existing `observation_recorded` mechanism with a new dedupeKey
|
|
185
|
+
|
|
186
|
+
**What ships:** Nothing new. Phase 1 convention is sufficient.
|
|
187
|
+
|
|
188
|
+
**Risk:** Attribution accuracy suffers for teams using same-branch workflows. But until we have data showing this is a real usage pattern, we don't know the actual impact.
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
### Candidate C: Append-style event for commit SHA accumulation (Phase 2, full solution)
|
|
193
|
+
|
|
194
|
+
Introduce a new event kind `commit_sha_appended` with `scope: { runId }` and `data: { sha: string, ref: string | null }`.
|
|
195
|
+
|
|
196
|
+
**Semantics:** Each time the agent makes a commit and reports it, they call `continue_workflow` with an artifact or emit a `commit_sha_appended` event (via a new MCP tool or a convention on continue_workflow). The projection folds all `commit_sha_appended` events into an ordered list.
|
|
197
|
+
|
|
198
|
+
**Advantage:** Solves the accumulation problem cleanly without relying on agent memory or the full-list convention.
|
|
199
|
+
|
|
200
|
+
**Cost:** New event kind, new schema, new MCP tool or convention, new projection. High ceremony.
|
|
201
|
+
|
|
202
|
+
**Verdict:** This is architecturally correct but over-engineered for the current problem. Worth pursuing if `commit_sha_appended` semantics turn out to be useful for more than just attribution (e.g., audit trail, PR linking). Defer until there's a concrete second use case.
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## Recommended Resolution
|
|
207
|
+
|
|
208
|
+
**Phase 1 (ship now):** Candidate A -- minimal `run_completed` event. Rationale:
|
|
209
|
+
- Provides engine-authoritative `endGitSha` which is not available via any existing mechanism (the dedupeKey collision blocks using `observation_recorded` for end SHA)
|
|
210
|
+
- The `captureConfidence` field gives consumers a clear signal about attribution reliability
|
|
211
|
+
- `agentCommitShas` copied from context at completion time creates a reliable snapshot even if context is overwritten later
|
|
212
|
+
- Cost is bounded: one new event kind, one new schema, exhaustive switch updates (bounded set)
|
|
213
|
+
|
|
214
|
+
**Drop from Phase 1:**
|
|
215
|
+
- Concurrent session detection -- needs injectable `SessionSummaryProvider` in `AdvanceCorePorts`, adds latency, unvalidated use case
|
|
216
|
+
- `durationSeconds` -- blocked on event timestamps
|
|
217
|
+
- 'low' confidence tier -- requires cross-session query infrastructure
|
|
218
|
+
|
|
219
|
+
**Phase 2 (only if data supports it):**
|
|
220
|
+
- Inject `SessionSummaryProvider` into the advance handler ports to enable concurrent session detection
|
|
221
|
+
- Emit 'low' confidence when concurrent same-branch sessions are detected
|
|
222
|
+
- Requires measuring actual same-branch concurrency rate first
|
|
223
|
+
|
|
224
|
+
**Accumulation fix (required alongside Candidate A):**
|
|
225
|
+
The step prompt convention in `docs/authoring-v2.md` MUST say: "When reporting commit SHAs, always include the full list of all commits made during this session, not just commits from the current step." This is the only safe fix given the shallow-merge contract.
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## Open Questions
|
|
230
|
+
|
|
231
|
+
1. **What is the actual same-branch concurrency rate?** The decision to build cross-session detection infrastructure depends on this. 539 time-overlapping pairs does not equal 539 branch-overlapping pairs. This data is needed before committing to Phase 2.
|
|
232
|
+
|
|
233
|
+
2. **Who are the consumers of `run_completed`?** The design assumes "the console dashboard" and "external analytics." Are there internal consumers (e.g., the daemon trigger system) that would benefit from a `run_completed` event? If so, this strengthens the case for Candidate A.
|
|
234
|
+
|
|
235
|
+
3. **Does the `AdvanceCorePorts` interface need to be extended for workspace anchor resolution at completion time?** No -- confirmed via code investigation. The `workspacePath` is available on `V2ContinueWorkflowInput` (the input to `handleAdvanceIntent`). Pre-resolve `endGitSha` from `input.workspacePath` before calling `executeAdvanceCore`, then pass `endGitSha: string | null` as a parameter through the call chain to `buildSuccessOutcome`. No port refactor needed.
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## Decision Log
|
|
240
|
+
|
|
241
|
+
### Winner: Candidate A (run_completed with agentCommitShas snapshot) -- with gitBranch addition
|
|
242
|
+
|
|
243
|
+
**Why it wins:**
|
|
244
|
+
- Provides engine-authoritative `endGitSha` and `startGitSha` in a single self-contained event
|
|
245
|
+
- `agentCommitShas` snapshot from context is durable (survives future context overwrites, unlike leaving SHAs only in context_set)
|
|
246
|
+
- Follows `assessment_recorded` precedent (snapshots context data into event log at completion time)
|
|
247
|
+
- Fully reversible: if agent compliance is poor, consumers can ignore `agentCommitShas` and fall back to 'medium' -- no schema change needed
|
|
248
|
+
- Incremental cost over the endGitSha-only variant is trivial (one field + superRefine + sha1 validation)
|
|
249
|
+
|
|
250
|
+
**Why the runner-up (endGitSha-only) lost:**
|
|
251
|
+
- Would make 'high' confidence tier permanently unachievable, defeating the stated goal (concurrent same-branch disambiguation)
|
|
252
|
+
- Cost difference vs. winner is negligible
|
|
253
|
+
|
|
254
|
+
**Why "defer entirely" lost:**
|
|
255
|
+
- No mechanism for capturing `endGitSha` without a new event kind (dedupeKey collision blocks reusing `observation_recorded`)
|
|
256
|
+
- Without `endGitSha`, retrospective attribution queries break (current HEAD != session end SHA for old sessions)
|
|
257
|
+
|
|
258
|
+
### Final schema (confirmed after adversarial challenge and review)
|
|
259
|
+
|
|
260
|
+
```typescript
|
|
261
|
+
{
|
|
262
|
+
kind: 'run_completed',
|
|
263
|
+
scope: { runId: string },
|
|
264
|
+
data: {
|
|
265
|
+
startGitSha: string | null, // from observation_recorded events (git_head_sha key)
|
|
266
|
+
endGitSha: string | null, // resolved at completion via input.workspacePath
|
|
267
|
+
gitBranch: string | null, // from observation_recorded events (git_branch key)
|
|
268
|
+
agentCommitShas: readonly string[], // sha1 validated, from context.metrics_commit_shas
|
|
269
|
+
captureConfidence: 'high' | 'medium' | 'none',
|
|
270
|
+
// 'high' = agentCommitShas non-empty (requires superRefine cross-field invariant)
|
|
271
|
+
// 'medium' = startGitSha and endGitSha available
|
|
272
|
+
// 'none' = no git context
|
|
273
|
+
}
|
|
274
|
+
}
|
|
275
|
+
// Zod superRefine: captureConfidence === 'high' requires agentCommitShas.length > 0
|
|
276
|
+
// Zod: agentCommitShas entries validated as /^[0-9a-f]{40}$/
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
### Implementation pattern (threading fix)
|
|
280
|
+
|
|
281
|
+
In `src/mcp/handlers/v2-execution/continue-advance.ts`, before calling `executeAdvanceCore`:
|
|
282
|
+
```typescript
|
|
283
|
+
// Lazy resolution: only resolve when session might be completing
|
|
284
|
+
// (exact guard inside buildSuccessOutcome, so resolve eagerly at the advance site)
|
|
285
|
+
const endGitSha: string | null = input.workspacePath
|
|
286
|
+
? await resolveEndGitSha(ctx.v2, input.workspacePath)
|
|
287
|
+
: null;
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
Pass `endGitSha` through `executeAdvanceCore` args to `buildSuccessOutcome`. Guard emission with `newEngineState.kind === 'complete'`.
|
|
291
|
+
|
|
292
|
+
---
|
|
293
|
+
|
|
294
|
+
## Final Summary
|
|
295
|
+
|
|
296
|
+
**Confidence band:** HIGH
|
|
297
|
+
|
|
298
|
+
**Residual risks:**
|
|
299
|
+
1. Agent compliance with full-accumulated-list convention is the riskiest assumption -- mitigation is authoring docs + step prompt, not architecture
|
|
300
|
+
2. No specific consumer of `run_completed` identified yet -- validate before Phase 2 investment
|
|
301
|
+
3. Same-branch concurrency rate unknown -- Phase 2 (concurrent detection) is data-gated on this
|
|
302
|
+
|
|
303
|
+
**Phase 1 (ship without gates):**
|
|
304
|
+
- `run_completed` event kind with final schema above
|
|
305
|
+
- `projectSessionGitAttributionV2` pure projection
|
|
306
|
+
- `ConsoleSessionSummary` extension (optional `gitAttribution` field, null-safe)
|
|
307
|
+
- `docs/authoring-v2.md` update: full-accumulated-list step prompt template (REQUIRED -- feature is dead without this)
|
|
308
|
+
- Consumer cross-check convention documented
|
|
309
|
+
|
|
310
|
+
**Phase 2 (data-gated):**
|
|
311
|
+
- Inject `SessionSummaryProvider` into `AdvanceCorePorts` for concurrent session detection
|
|
312
|
+
- Add 'low' confidence tier when concurrent same-branch sessions detected
|
|
313
|
+
- Gate on: same-branch concurrency rate > 5% OR measured production misattribution incidents
|
|
314
|
+
|
|
315
|
+
**Phase 3 (if compliance measurement shows problem):**
|
|
316
|
+
- `commit_sha_appended` append-style event type for clean SHA accumulation
|
|
317
|
+
- Gate on: full-list compliance rate < 50% for multi-commit sessions
|
|
318
|
+
|
|
319
|
+
**What cannot ship regardless of phase:** `durationSeconds` -- blocked on event envelope timestamps, which require a separate infrastructure change.
|
|
@@ -0,0 +1,227 @@
|
|
|
1
|
+
# Session Metrics -- Design Candidates
|
|
2
|
+
|
|
3
|
+
*Raw investigative material for main agent synthesis. Not a final decision.*
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Problem Understanding
|
|
8
|
+
|
|
9
|
+
**Tensions (real tradeoffs):**
|
|
10
|
+
|
|
11
|
+
1. **Type safety vs. ergonomics.** Typed artifact contracts enforce schema at runtime (valuable for machine-queryable data) but require `outputContract` declarations on specific workflow steps. `context_set` is untyped but universally writeable at any step with no workflow changes.
|
|
12
|
+
|
|
13
|
+
2. **Session-level vs. step-level.** Metrics (PR numbers, session outcome, files changed) are session-level outcomes. Artifacts are node-scoped by design. `context_set` is run-scoped (correct match). This is a real architectural mismatch for Candidate B.
|
|
14
|
+
|
|
15
|
+
3. **Agent self-reporting reliability vs. external observation.** Git stats (LOC, files changed) computed by the agent are advisory only. Capturing the HEAD SHA at start and end and running `git diff --stat` post-hoc gives authoritative numbers -- but requires new lifecycle infrastructure.
|
|
16
|
+
|
|
17
|
+
4. **Convention vs. enforcement.** A convention requires agents to follow it voluntarily. Without a schema or validation signal, agents will be inconsistent. Enforcement (artifact contract validation, auto-injected step prompts) adds ceremony but ensures consistency.
|
|
18
|
+
|
|
19
|
+
**What makes this hard:**
|
|
20
|
+
|
|
21
|
+
- `mergeContext` uses SHALLOW merge semantics -- nested object values are REPLACED, not merged. Using `context.metrics = {...}` as the carrier is a footgun: agents who set partial updates at different steps will silently lose earlier keys. Must use FLAT top-level keys instead.
|
|
22
|
+
- The `observation_recorded` event has a CLOSED discriminated union key set. Adding new keys is not a data change -- it requires updating the TypeScript schema union, all exhaustive handlers, and documentation.
|
|
23
|
+
- No event timestamps in the event envelope -- session duration is fundamentally uncomputable from the event log today, regardless of what metrics mechanism is chosen.
|
|
24
|
+
- The `context_set` projection (`projectRunContextV2` and `SessionIndex.runContextByRunId`) uses LATEST-WINS semantics per run. Each agent delta completely replaces the previous context snapshot. This is fine for flat top-level keys (each key is independently addressable) but breaks nested metrics accumulation.
|
|
25
|
+
|
|
26
|
+
**Likely seam:** The projection layer (`src/v2/projections/`) and the console session summary DTO (`src/v2/usecases/console-types.ts`). The capture boundary can stay unchanged; the gap is extraction and display.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Philosophy Constraints
|
|
31
|
+
|
|
32
|
+
**Principles that matter most here:**
|
|
33
|
+
|
|
34
|
+
- **Prefer explicit domain types over primitives** -- `context_set` carries `JsonValue`, which is as primitive as it gets. A typed projection output (`SessionMetricsV2` interface) can add type discipline at the extraction boundary without changing the storage format.
|
|
35
|
+
- **Validate at boundaries, trust inside** -- the `projectSessionMetricsV2` projection should validate and coerce context keys into typed output at the projection boundary.
|
|
36
|
+
- **Exhaustiveness everywhere** -- any new event kind or `observation_recorded` key extension requires updating the discriminated union. This is a real cost, not just bureaucracy.
|
|
37
|
+
- **YAGNI with discipline** -- token cost metrics are not achievable today; do not design for them. Event timestamps are infrastructure worth doing but orthogonal to this proposal.
|
|
38
|
+
- **Architectural fixes over patches** -- auto-injected final step is a patch. The right architectural answer is to define a queryable projection over existing data.
|
|
39
|
+
|
|
40
|
+
**Philosophy conflicts:**
|
|
41
|
+
|
|
42
|
+
- Using `context_set` as a metrics carrier conflicts with "prefer explicit domain types." The conflict is manageable if the projection output is typed.
|
|
43
|
+
- The `wr.contracts.metrics` approach honors "prefer explicit domain types" but conflicts with YAGNI (high ceremony for advisory data) and the session-vs-node scope mismatch makes it architecturally wrong.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Impact Surface
|
|
48
|
+
|
|
49
|
+
**Files that must stay consistent if we change the projection layer:**
|
|
50
|
+
|
|
51
|
+
- `src/v2/projections/session-metrics.ts` (new) -- must consume `SortedEventLog` like other projections
|
|
52
|
+
- `src/v2/usecases/console-service.ts` -- calls `projectSessionMetricsV2`, populates `ConsoleSessionSummary`
|
|
53
|
+
- `src/v2/usecases/console-types.ts` -- extends `ConsoleSessionSummary` with metrics field
|
|
54
|
+
- `console/src/api/types.ts` -- mirrors `ConsoleSessionSummary`; must be kept in sync
|
|
55
|
+
- `docs/authoring-v2.md` -- convention must be documented for workflow authors
|
|
56
|
+
- `src/v2/infra/local/session-summary-provider/index.ts` -- may need to call the new projection
|
|
57
|
+
|
|
58
|
+
**Contracts that must remain consistent:**
|
|
59
|
+
|
|
60
|
+
- `ConsoleSessionSummary` is a published DTO consumed by the console frontend; any new field must be backward-compatible (optional or null-safe).
|
|
61
|
+
- `SortedEventLog` is the canonical input type for projections; new projection must accept it.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Candidates
|
|
66
|
+
|
|
67
|
+
### Candidate A: `context_set` convention with flat metrics keys (simplest sufficient)
|
|
68
|
+
|
|
69
|
+
**Summary:** Define a documented `metrics_*` key convention at the top level of `context_set` events. Implement `projectSessionMetricsV2` to extract these keys. Add `metrics: SessionMetricsV2 | null` to `ConsoleSessionSummary`.
|
|
70
|
+
|
|
71
|
+
**Tension resolution:**
|
|
72
|
+
- Resolves: minimal author burden, backward-compatible, queryable with new projection, no schema change
|
|
73
|
+
- Accepts: weak type safety (JsonValue storage), agent reliability (no enforcement), agents must self-compute git stats
|
|
74
|
+
|
|
75
|
+
**Boundary solved at:** Projection layer -- the existing `context_set` mechanism is unchanged; only the extraction and display layers are new.
|
|
76
|
+
|
|
77
|
+
**Why this boundary is best-fit:** The `isAutonomous` and `parentSessionId` precedents show that `context_set` is already the approved mechanism for session-level advisory metadata. Adding a `metrics_*` convention follows the exact same pattern. This is "validate at boundaries" -- the projection validates/coerces the advisory data into a typed interface.
|
|
78
|
+
|
|
79
|
+
**Specific convention:**
|
|
80
|
+
- `metrics_outcome: 'success' | 'partial' | 'abandoned' | 'error'`
|
|
81
|
+
- `metrics_pr_numbers: number[]`
|
|
82
|
+
- `metrics_files_changed: number` (advisory)
|
|
83
|
+
- `metrics_lines_added: number` (advisory)
|
|
84
|
+
- `metrics_lines_removed: number` (advisory)
|
|
85
|
+
- `metrics_git_head_end: string` (advisory HEAD SHA at end; superseded by Phase 2)
|
|
86
|
+
|
|
87
|
+
**Failure mode:** Agents use inconsistent key names or forget to report. Console shows null metrics for most sessions. Enforcement can be added incrementally (prompted via workflow step guidance).
|
|
88
|
+
|
|
89
|
+
**Repo-pattern relationship:** Follows `projectRunContextV2` and `ConsoleSessionSummary.isAutonomous` exactly.
|
|
90
|
+
|
|
91
|
+
**Gains:** Zero schema change, works for any workflow today, reversible (just remove the projection and console field).
|
|
92
|
+
|
|
93
|
+
**Losses:** Weak type safety, no enforcement, agent-computed git stats are advisory.
|
|
94
|
+
|
|
95
|
+
**Scope:** Best-fit for Phase 1.
|
|
96
|
+
|
|
97
|
+
**Philosophy fit:** Honors YAGNI. Conflicts with "prefer explicit domain types" (JsonValue storage, mitigated by typed projection output).
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
### Candidate B: `wr.contracts.metrics` artifact contract (typed, validated)
|
|
102
|
+
|
|
103
|
+
**Summary:** Define a new Zod-validated artifact contract `wr.contracts.metrics`, declared on a workflow's final/summary step `outputContract`. Machine-verifiable at `continue_workflow` boundary.
|
|
104
|
+
|
|
105
|
+
**Tension resolution:**
|
|
106
|
+
- Resolves: type safety, validation at boundary, machine-queryable via existing artifact projection
|
|
107
|
+
- Accepts: author burden (outputContract required on final step), node-scoped (architectural mismatch for session-level metrics), not universal for existing workflows
|
|
108
|
+
|
|
109
|
+
**Boundary solved at:** The artifact contract system (`src/v2/durable-core/schemas/artifacts/`).
|
|
110
|
+
|
|
111
|
+
**Why this boundary is WRONG for this problem:** Artifacts are node-scoped (`node_output_appended` with `scope: {runId, nodeId}`). Session-level metrics don't belong to any single node. The closest existing analogs (`wr.contracts.review_verdict`, `wr.contracts.assessment`) are also node-scoped but are semantically correct there (they assess what a node did). Metrics are session-level: they describe the session outcome, not what a node did.
|
|
112
|
+
|
|
113
|
+
**Failure mode:** Most workflows don't have a dedicated final/summary step. All existing workflows would need to be updated to add a final step with `outputContract: wr.contracts.metrics`. This is a high-friction migration.
|
|
114
|
+
|
|
115
|
+
**Repo-pattern relationship:** Adapts the existing artifact contract pattern (correct for node-level structured outputs).
|
|
116
|
+
|
|
117
|
+
**Gains:** Type safety, enforcement at boundary, reuses existing validation infrastructure.
|
|
118
|
+
|
|
119
|
+
**Losses:** High author burden, wrong granularity (node vs. session), not backward-compatible.
|
|
120
|
+
|
|
121
|
+
**Scope:** Too broad for Phase 1. Potentially valid as a Phase 2 opt-in (not mandatory) for workflows that want formal metrics reporting.
|
|
122
|
+
|
|
123
|
+
**Philosophy fit:** Honors "prefer explicit domain types", "validate at boundaries." Conflicts with YAGNI and "minimal author burden."
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
### Candidate C: End-of-session HEAD SHA observation + post-hoc diff (reframe)
|
|
128
|
+
|
|
129
|
+
**Summary:** Emit an additional `observation_recorded` event with key `git_head_sha_end` when the session reaches `complete` state. Add a console endpoint `GET /api/v2/sessions/:id/diff-summary` that runs `git diff --stat <start_sha>..<end_sha>` for authoritative LOC and files-changed.
|
|
130
|
+
|
|
131
|
+
**Tension resolution:**
|
|
132
|
+
- Resolves: agent reliability (git stats from git, not agent), no false precision, zero author burden for git metrics
|
|
133
|
+
- Accepts: schema change (closed enum extension), requires git access from WorkRail at query time, complex lifecycle hook at session completion
|
|
134
|
+
|
|
135
|
+
**Boundary solved at:** Two boundaries -- (1) session lifecycle (end-of-session observation emission), (2) console query layer (new endpoint).
|
|
136
|
+
|
|
137
|
+
**Why this boundary is right for Phase 2 but wrong for Phase 1:** Authoritative git stats are more valuable than advisory agent-reported stats. But extending the `observation_recorded` closed enum requires a schema union update, and WorkRail currently has no git access at the MCP handler level. This is a significant but well-bounded engineering investment, appropriate as a follow-up after Phase 1 validates demand.
|
|
138
|
+
|
|
139
|
+
**Specific shape:**
|
|
140
|
+
- New `observation_recorded` key: `git_head_sha_end` (requires extending `DomainEventV1Schema`)
|
|
141
|
+
- Emission hook: when `advanceAndRecord` produces `outcome.kind = 'advanced'` and the next snapshot state is `complete`, emit the end observation
|
|
142
|
+
- Console endpoint: reads start SHA (first `observation_recorded.git_head_sha`) + end SHA (last `observation_recorded.git_head_sha_end`), runs `git diff --stat start..end`, returns JSON
|
|
143
|
+
|
|
144
|
+
**Failure mode:** Extending the closed observation_recorded enum requires updating all exhaustive handlers. WorkRail has no git access today -- the console server would need to run the diff (it has access to the repo path via `repo_root` observation). Git history could be incomplete if the agent rebased between start and end.
|
|
145
|
+
|
|
146
|
+
**Repo-pattern relationship:** Extends the existing `observation_recorded` start-of-session pattern.
|
|
147
|
+
|
|
148
|
+
**Gains:** Authoritative git metrics, zero agent participation.
|
|
149
|
+
|
|
150
|
+
**Losses:** Schema blast radius (closed enum extension), new git access boundary, complex lifecycle hook.
|
|
151
|
+
|
|
152
|
+
**Scope:** Best-fit for Phase 2 (separate GitHub issue).
|
|
153
|
+
|
|
154
|
+
**Philosophy fit:** Honors "architectural fixes over patches", "determinism." Conflicts with YAGNI (Phase 1 perspective).
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
### Candidate D: Add `timestampMs` to event envelope (foundational duration infrastructure)
|
|
159
|
+
|
|
160
|
+
**Summary:** Add optional `timestampMs: number` to `DomainEventEnvelopeV1Schema`. Session duration = `last_event.timestampMs - first_event.timestampMs`.
|
|
161
|
+
|
|
162
|
+
**Tension resolution:**
|
|
163
|
+
- Resolves: session duration computable from event log without agent participation
|
|
164
|
+
- Accepts: significant schema change, not backward-compatible for existing sessions, adds size to every event
|
|
165
|
+
|
|
166
|
+
**Boundary solved at:** `DomainEventEnvelopeV1Schema` -- the lowest-level shared schema in the event log.
|
|
167
|
+
|
|
168
|
+
**Why this boundary is too broad:** This is foundational infrastructure that happens to enable one metric (duration). The blast radius affects every event builder, the session index, all parsers, and the design locks doc. This is a separate engineering investment that should be proposed and decided independently.
|
|
169
|
+
|
|
170
|
+
**Failure mode:** Existing sessions parse fine (optional field) but show null duration. Requires clock injection into all event builders. The design locks doc (`docs/design/v2-core-design-locks.md`) would need a new lock entry.
|
|
171
|
+
|
|
172
|
+
**Repo-pattern relationship:** Departs from existing pattern (events have no timestamps today).
|
|
173
|
+
|
|
174
|
+
**Gains:** Duration metrics without agent participation; future event-timing analytics.
|
|
175
|
+
|
|
176
|
+
**Losses:** Significant blast radius, orthogonal scope from "metrics capture."
|
|
177
|
+
|
|
178
|
+
**Scope:** Too broad for this proposal. Separate proposal: `feat(engine): add event envelope timestamps`.
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
## Comparison and Recommendation
|
|
183
|
+
|
|
184
|
+
**Primary recommendation: Candidate A (flat keys) for Phase 1. Candidate C for Phase 2.**
|
|
185
|
+
|
|
186
|
+
| Criterion | A (context_set) | B (artifact) | C (end SHA) | D (timestamps) |
|
|
187
|
+
|-----------|----------------|--------------|-------------|----------------|
|
|
188
|
+
| Backward compatible | ✅ | ❌ | ⚠️ | ⚠️ |
|
|
189
|
+
| Author burden | minimal | high | zero | zero |
|
|
190
|
+
| Type safety | projection-level | strong | git-authoritative | N/A |
|
|
191
|
+
| Scope | best-fit (P1) | too broad | best-fit (P2) | separate |
|
|
192
|
+
| Schema change | none | new files only | enum extension | all events |
|
|
193
|
+
|
|
194
|
+
**Candidate B loses:** Session-vs-node scope mismatch is architecturally wrong. High ceremony for advisory data.
|
|
195
|
+
|
|
196
|
+
**Candidate C is Phase 2:** Right solution for authoritative git stats, wrong phase (too complex for immediate need).
|
|
197
|
+
|
|
198
|
+
**Candidate D is a separate proposal:** Foundational infrastructure, orthogonal to metrics capture.
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
## Self-Critique
|
|
203
|
+
|
|
204
|
+
**Strongest counter-argument against Candidate A:**
|
|
205
|
+
|
|
206
|
+
Flat `metrics_*` keys pollute the global context namespace. A workflow author using `context.metrics_outcome` for their own purpose would silently conflict with the convention. The counter: the `metrics_` prefix is a documented namespace convention; conflicts are the author's error, not a framework deficiency.
|
|
207
|
+
|
|
208
|
+
**What would tip toward Candidate B:**
|
|
209
|
+
|
|
210
|
+
If the use case requires formal audit-trail compliance (e.g., billing based on LOC metrics), advisory `context_set` data is insufficient. Typed artifact contracts with validation would be necessary. No such requirement exists currently.
|
|
211
|
+
|
|
212
|
+
**What would tip toward doing nothing:**
|
|
213
|
+
|
|
214
|
+
If inspection of real session event logs reveals that agents ALREADY consistently self-report outcomes in step notes (plain markdown), the right solution is better console extraction of markdown content, not new capture infrastructure. This should be validated before Phase 1 implementation.
|
|
215
|
+
|
|
216
|
+
**Pivoting away from flat keys:**
|
|
217
|
+
|
|
218
|
+
If the context budget (`MAX_CONTEXT_BYTES = 256KB`) becomes a constraint (unlikely), or if namespace pollution proves problematic, a Phase 1.5 could introduce a lightweight `context.metrics` object with explicit full-replacement semantics (agents always send the complete object).
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
## Open Questions for Main Agent
|
|
223
|
+
|
|
224
|
+
1. Are agents already self-reporting outcomes in step notes? (Validate primary framing risk before implementing.)
|
|
225
|
+
2. Should Phase 1 include any enforcement signal (e.g., a final step prompt that asks for `metrics_*` keys), or is the convention sufficient initially?
|
|
226
|
+
3. Should `metrics_git_head_end` (advisory) be included in Phase 1, or deferred to Phase 2?
|
|
227
|
+
4. Is the context namespace pollution risk (flat `metrics_*` keys) acceptable, or should the convention use a namespacing strategy?
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
# Session Metrics -- Design Review Findings
|
|
2
|
+
|
|
3
|
+
*Review findings for the selected direction: Candidate A (flat context_set keys + projectSessionMetricsV2 projection)*
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Tradeoff Review
|
|
8
|
+
|
|
9
|
+
| Tradeoff | Assessment | Condition for Reversal |
|
|
10
|
+
|----------|-----------|----------------------|
|
|
11
|
+
| JsonValue storage, typed projection output | Acceptable. Consistent with isAutonomous/parentSessionId precedent. | If metrics_outcome drives workflow branching → migrate to typed artifact contract |
|
|
12
|
+
| No enforcement, relies on step prompting | Acceptable IF prompting guidance is included. Without it, feature is functionally dead. | N/A -- enforcement is a Phase 2 enhancement |
|
|
13
|
+
| Advisory git stats (LOC, files) | Acceptable with clear "agent-reported" UI labeling. | Phase 2 (Candidate C) removes this tradeoff |
|
|
14
|
+
| Flat key namespace (metrics_ prefix) | Acceptable. mergeContext semantics are design-locked at §18.2. | If 20+ metrics keys make context debugging painful → introduce sub-object with explicit full-replacement semantics |
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Failure Mode Review
|
|
19
|
+
|
|
20
|
+
| Failure Mode | Covered | Missing Mitigation | Risk |
|
|
21
|
+
|-------------|---------|-------------------|------|
|
|
22
|
+
| Convention adoption near-zero without step prompting | Partially -- docs required, but template not specified | Concrete reusable step prompt template in docs/authoring-v2.md | HIGH |
|
|
23
|
+
| Inconsistent key names (metrics_pr_number vs metrics_pr_numbers) | Silently returns null (acceptable) | None required for Phase 1 | LOW |
|
|
24
|
+
| Nested metrics object shallow-merge footgun | NOT covered by design | Explicit WARNING in authoring docs against using nested context.metrics = {...} | MEDIUM |
|
|
25
|
+
| Malformed projection values | Requires defensive coercion implementation | Projection must coerce/null, not throw, on malformed types | MEDIUM |
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Runner-Up / Simpler Alternative Review
|
|
30
|
+
|
|
31
|
+
**From runner-up (Candidate C):** `metrics_git_head_end` should be included in Phase 1 convention as an advisory key with explicit Phase 2 upgrade path (same key, automatic source in Phase 2). No design change needed -- this is additive.
|
|
32
|
+
|
|
33
|
+
**Simpler variant:** `metrics_outcome` only (single key, closed enum). Valid as a scope reduction if Phase 1 must ship quickly. Full spec is already scope-appropriate.
|
|
34
|
+
|
|
35
|
+
**Hybrid:** `metrics_outcome` as the primary/required recommendation; all others optional. Already how the design is structured.
|
|
36
|
+
|
|
37
|
+
**Conclusion:** No material design changes from comparison. Design is well-scoped.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Philosophy Alignment
|
|
42
|
+
|
|
43
|
+
| Principle | Status |
|
|
44
|
+
|-----------|--------|
|
|
45
|
+
| YAGNI | ✅ Satisfied |
|
|
46
|
+
| Immutability | ✅ Satisfied |
|
|
47
|
+
| Validate at boundaries | ✅ Satisfied (projection coercion) |
|
|
48
|
+
| Functional/declarative | ✅ Satisfied |
|
|
49
|
+
| Determinism | ✅ Satisfied |
|
|
50
|
+
| Prefer explicit domain types | ⚠️ Tension -- JsonValue storage (acceptable, consistent with existing precedent) |
|
|
51
|
+
| Type safety as first line of defense | ⚠️ Tension -- no compile-time guarantee on agent output (acceptable for advisory data) |
|
|
52
|
+
| Make illegal states unrepresentable | ⚠️ Tension -- malformed values coerced to null (acceptable for advisory data) |
|
|
53
|
+
| Architectural fixes over patches | ✅ Satisfied -- projection layer, not a special case |
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Findings
|
|
58
|
+
|
|
59
|
+
### RED (Blocking or Critical)
|
|
60
|
+
|
|
61
|
+
None identified. Design does not violate any hard invariants.
|
|
62
|
+
|
|
63
|
+
### ORANGE (Important, Must Address Before Ship)
|
|
64
|
+
|
|
65
|
+
**O1: Convention adoption requires concrete step prompt template**
|
|
66
|
+
The design as specified requires a step prompt template in `docs/authoring-v2.md`. Without a copy-paste ready prompt that workflow authors can add to their final step, adoption will be near-zero and the feature will appear broken. This is a REQUIRED deliverable alongside the code changes.
|
|
67
|
+
|
|
68
|
+
**O2: Explicit warning against nested metrics objects**
|
|
69
|
+
The shallow merge footgun (`context.metrics = {...}` gets fully replaced on partial updates) must be explicitly warned against in the authoring docs. Without this, workflow authors who follow intuition will silently lose metric data.
|
|
70
|
+
|
|
71
|
+
### YELLOW (Advisories, Can Be Addressed in Follow-Up)
|
|
72
|
+
|
|
73
|
+
**Y1: Defensive projection coercion must be implemented**
|
|
74
|
+
The `projectSessionMetricsV2` projection must coerce malformed values to null rather than propagating or throwing. E.g., if `metrics_pr_numbers` is the string `"123"` instead of `[123]`, return `null` for that field. Implement this defensively at the projection boundary.
|
|
75
|
+
|
|
76
|
+
**Y2: Console UI must label advisory metrics**
|
|
77
|
+
The console display must show "agent-reported" or equivalent label for all Phase 1 metrics to avoid false precision. Do not present advisory LOC counts as authoritative. This is a UX requirement, not a code requirement.
|
|
78
|
+
|
|
79
|
+
**Y3: Document escalation trigger for type safety**
|
|
80
|
+
If `metrics_outcome` is ever used to drive automated workflow behavior (conditional branching, auto-dispatch), the type-safety tension becomes critical. Document this escalation condition in the design notes so future feature work knows when to migrate to a typed artifact contract.
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Recommended Revisions
|
|
85
|
+
|
|
86
|
+
1. **Add to Phase 1 scope:** A reusable final-step prompt template in `docs/authoring-v2.md` -- exact wording for a "report outcomes" step that instructs the agent to set `metrics_outcome`, `metrics_pr_numbers`, and other relevant keys. (O1)
|
|
87
|
+
|
|
88
|
+
2. **Add to authoring docs:** A warning section "Do not use context.metrics = {...}" explaining the shallow merge footgun and why flat `metrics_*` keys are required. (O2)
|
|
89
|
+
|
|
90
|
+
3. **Add to projection implementation spec:** Defensive coercion: all metrics fields return `null` if absent or malformed; projection never throws on bad metric values. (Y1)
|
|
91
|
+
|
|
92
|
+
4. **Add to console UI spec:** Advisory labeling for all Phase 1 metrics values. "Agent-reported" tag or equivalent. (Y2)
|
|
93
|
+
|
|
94
|
+
5. **Include `metrics_git_head_end` in Phase 1 convention** with documentation that this is advisory and will be superseded by automatic capture in Phase 2. (from runner-up borrowing)
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Residual Concerns
|
|
99
|
+
|
|
100
|
+
1. **Primary framing risk unverified:** We have not verified whether agents are already self-reporting outcomes informally in step notes. If they are, Phase 1 may duplicate effort or conflict with existing informal conventions. **Recommendation:** Before shipping, sample 10-20 recent session event logs and check whether any `context_set` events already contain outcome-like data.
|
|
101
|
+
|
|
102
|
+
2. **`metrics_git_head_end` reliability:** Agents may forget to run `git rev-parse HEAD` or may report the wrong commit (e.g., the HEAD before their final commit). Phase 1 should document this as advisory with low confidence, and Phase 2 should be prioritized to make it authoritative.
|
|
103
|
+
|
|
104
|
+
3. **Console projection callsite:** `console-service.ts` calls `projectRunContextV2` today; adding a call to `projectSessionMetricsV2` is additive but adds latency to session summary loading. If metrics projection is expensive (it shouldn't be for a simple key-read), it should be lazy-loaded or cached alongside the existing summary cache.
|