@exaudeus/workrail 3.28.0 → 3.30.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console/assets/{index-C146q2kN.js → index-Bl5-Ghuu.js} +1 -1
- package/dist/console/index.html +1 -1
- package/dist/manifest.json +3 -3
- package/docs/README.md +57 -0
- package/docs/adrs/001-hybrid-storage-backend.md +38 -0
- package/docs/adrs/002-four-layer-context-classification.md +38 -0
- package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
- package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
- package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
- package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
- package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
- package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
- package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
- package/docs/adrs/010-release-pipeline.md +89 -0
- package/docs/architecture/README.md +7 -0
- package/docs/architecture/refactor-audit.md +364 -0
- package/docs/authoring-v2.md +527 -0
- package/docs/authoring.md +873 -0
- package/docs/changelog-recent.md +201 -0
- package/docs/configuration.md +505 -0
- package/docs/ctc-mcp-proposal.md +518 -0
- package/docs/design/README.md +22 -0
- package/docs/design/agent-cascade-protocol.md +96 -0
- package/docs/design/autonomous-console-design-candidates.md +253 -0
- package/docs/design/autonomous-console-design-review.md +111 -0
- package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
- package/docs/design/claude-code-source-deep-dive.md +713 -0
- package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
- package/docs/design/console-execution-trace-candidates-final.md +160 -0
- package/docs/design/console-execution-trace-candidates.md +211 -0
- package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
- package/docs/design/console-execution-trace-design-review.md +74 -0
- package/docs/design/console-execution-trace-discovery.md +394 -0
- package/docs/design/console-execution-trace-final-review.md +77 -0
- package/docs/design/console-execution-trace-review.md +92 -0
- package/docs/design/console-performance-discovery.md +415 -0
- package/docs/design/console-ui-backlog.md +280 -0
- package/docs/design/daemon-architecture-discovery.md +853 -0
- package/docs/design/daemon-design-candidates.md +318 -0
- package/docs/design/daemon-design-review-findings.md +119 -0
- package/docs/design/daemon-engine-design-candidates.md +210 -0
- package/docs/design/daemon-engine-design-review.md +131 -0
- package/docs/design/daemon-execution-engine-discovery.md +280 -0
- package/docs/design/daemon-gap-analysis.md +554 -0
- package/docs/design/daemon-owns-console-plan.md +168 -0
- package/docs/design/daemon-owns-console-review.md +91 -0
- package/docs/design/daemon-owns-console.md +195 -0
- package/docs/design/data-model-erd.md +11 -0
- package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
- package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
- package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
- package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
- package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
- package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
- package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
- package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
- package/docs/design/list-workflows-latency-fix-plan.md +128 -0
- package/docs/design/list-workflows-latency-fix-review.md +55 -0
- package/docs/design/list-workflows-latency-fix.md +109 -0
- package/docs/design/native-context-management-api.md +11 -0
- package/docs/design/performance-sweep-2026-04.md +96 -0
- package/docs/design/routines-guide.md +219 -0
- package/docs/design/sequence-diagrams.md +11 -0
- package/docs/design/subagent-design-principles.md +220 -0
- package/docs/design/temporal-patterns-design-candidates.md +312 -0
- package/docs/design/temporal-patterns-design-review-findings.md +163 -0
- package/docs/design/test-isolation-from-config-file.md +335 -0
- package/docs/design/v2-core-design-locks.md +2746 -0
- package/docs/design/v2-lock-registry.json +734 -0
- package/docs/design/workflow-authoring-v2.md +1044 -0
- package/docs/design/workflow-docs-spec.md +218 -0
- package/docs/design/workflow-extension-points.md +687 -0
- package/docs/design/workrail-auto-trigger-system.md +359 -0
- package/docs/design/workrail-config-file-discovery.md +513 -0
- package/docs/docker.md +110 -0
- package/docs/generated/v2-lock-closure-plan.md +26 -0
- package/docs/generated/v2-lock-coverage.json +797 -0
- package/docs/generated/v2-lock-coverage.md +177 -0
- package/docs/ideas/backlog.md +3927 -0
- package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
- package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
- package/docs/ideas/implementation_plan.md +249 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
- package/docs/implementation/02-architecture.md +316 -0
- package/docs/implementation/04-testing-strategy.md +124 -0
- package/docs/implementation/09-simple-workflow-guide.md +835 -0
- package/docs/implementation/13-advanced-validation-guide.md +874 -0
- package/docs/implementation/README.md +21 -0
- package/docs/integrations/claude-code.md +300 -0
- package/docs/integrations/firebender.md +315 -0
- package/docs/migration/v0.1.0.md +147 -0
- package/docs/naming-conventions.md +45 -0
- package/docs/planning/README.md +104 -0
- package/docs/planning/github-ticketing-playbook.md +195 -0
- package/docs/plans/README.md +24 -0
- package/docs/plans/agent-managed-ticketing-design.md +605 -0
- package/docs/plans/agentic-orchestration-roadmap.md +112 -0
- package/docs/plans/assessment-gates-engine-handoff.md +536 -0
- package/docs/plans/content-coherence-and-references.md +151 -0
- package/docs/plans/library-extraction-plan.md +340 -0
- package/docs/plans/mr-review-workflow-redesign.md +1451 -0
- package/docs/plans/native-context-management-epic.md +11 -0
- package/docs/plans/perf-fixes-design-candidates.md +225 -0
- package/docs/plans/perf-fixes-design-review-findings.md +61 -0
- package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
- package/docs/plans/perf-fixes-new-issues-review.md +110 -0
- package/docs/plans/prompt-fragments.md +53 -0
- package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
- package/docs/plans/ui-ux-workflow-discovery.md +100 -0
- package/docs/plans/ui-ux-workflow-review.md +48 -0
- package/docs/plans/v2-followup-enhancements.md +587 -0
- package/docs/plans/workflow-categories-candidates.md +105 -0
- package/docs/plans/workflow-categories-discovery.md +110 -0
- package/docs/plans/workflow-categories-review.md +51 -0
- package/docs/plans/workflow-discovery-model-candidates.md +94 -0
- package/docs/plans/workflow-discovery-model-discovery.md +74 -0
- package/docs/plans/workflow-discovery-model-review.md +48 -0
- package/docs/plans/workflow-source-setup-phase-1.md +245 -0
- package/docs/plans/workflow-source-setup-phase-2.md +361 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
- package/docs/plans/workflow-staleness-detection-review.md +58 -0
- package/docs/plans/workflow-staleness-detection.md +80 -0
- package/docs/plans/workflow-v2-design.md +69 -0
- package/docs/plans/workflow-v2-roadmap.md +74 -0
- package/docs/plans/workflow-validation-design.md +98 -0
- package/docs/plans/workflow-validation-roadmap.md +108 -0
- package/docs/plans/workrail-platform-vision.md +420 -0
- package/docs/reference/agent-context-cleaner-snippet.md +94 -0
- package/docs/reference/agent-context-guidance.md +140 -0
- package/docs/reference/context-optimization.md +284 -0
- package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
- package/docs/reference/example-workflow-repository-template/README.md +268 -0
- package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
- package/docs/reference/external-workflow-repositories.md +916 -0
- package/docs/reference/feature-flags-architecture.md +472 -0
- package/docs/reference/feature-flags.md +349 -0
- package/docs/reference/god-tier-workflow-validation.md +272 -0
- package/docs/reference/loop-optimization.md +209 -0
- package/docs/reference/loop-validation.md +176 -0
- package/docs/reference/loops.md +465 -0
- package/docs/reference/mcp-platform-constraints.md +59 -0
- package/docs/reference/recovery.md +88 -0
- package/docs/reference/releases.md +177 -0
- package/docs/reference/troubleshooting.md +105 -0
- package/docs/reference/workflow-execution-contract.md +998 -0
- package/docs/roadmap/README.md +22 -0
- package/docs/roadmap/legacy-planning-status.md +103 -0
- package/docs/roadmap/now-next-later.md +70 -0
- package/docs/roadmap/open-work-inventory.md +389 -0
- package/docs/tickets/README.md +39 -0
- package/docs/tickets/next-up.md +76 -0
- package/docs/workflow-management.md +317 -0
- package/docs/workflow-templates.md +423 -0
- package/docs/workflow-validation.md +184 -0
- package/docs/workflows.md +254 -0
- package/package.json +4 -1
- package/spec/authoring-spec.json +61 -16
- package/workflows/workflow-for-workflows.json +3 -3
- package/workflows/workflow-for-workflows.v2.json +3 -3
|
@@ -0,0 +1,2746 @@
|
|
|
1
|
+
# WorkRail v2: Core Design Locks (Consolidated)
|
|
2
|
+
**Status:** Draft (intended to be locked)
|
|
3
|
+
**Date:** 2025-12-19
|
|
4
|
+
|
|
5
|
+
This document consolidates the v2 “design lock” decisions that are easy to drift during implementation.
|
|
6
|
+
|
|
7
|
+
It is intentionally not a full spec. For normative protocol and platform constraints, use:
|
|
8
|
+
- `docs/reference/workflow-execution-contract.md`
|
|
9
|
+
- `docs/reference/mcp-platform-constraints.md`
|
|
10
|
+
- ADRs 005–007
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## 1) Append-only truth substrate: event log + node snapshots
|
|
15
|
+
|
|
16
|
+
### Two append-only stores
|
|
17
|
+
WorkRail v2 persists durable truth in two append-only stores:
|
|
18
|
+
- **Event log (per session)**: strictly ordered, typed events.
|
|
19
|
+
- **Node snapshot store**: immutable, typed, versioned snapshots referenced by events.
|
|
20
|
+
|
|
21
|
+
The event log is truth for lineage and facts; node snapshots exist only to rehydrate execution deterministically and re-mint runtime tokens.
|
|
22
|
+
|
|
23
|
+
### Storage invariants (must hold)
|
|
24
|
+
- **Authoritative ordering**: a monotonic per-session `EventIndex` is the ordering source for projections and policies. Timestamps (if any) are informational only.
|
|
25
|
+
- **EventIndex origin (locked)**: `EventIndex` is **0-based**. The first domain event in a session has `eventIndex=0`.
|
|
26
|
+
- **Crash-safe append**: append-only, atomic writes; JSONL **segment files** (not one monolithic file).
|
|
27
|
+
- write temp → `fsync(file)` → `rename` → `fsync(dir)`
|
|
28
|
+
- **Single writer per session**: enforce cross-process lock; if busy, fail fast with structured retryable error.
|
|
29
|
+
- **Segmentation is `EventIndex`-driven**: segment naming/bounds are keyed to `EventIndex` (never timestamps).
|
|
30
|
+
- **Segment rotation (hybrid)**: rotate segments on the first threshold hit (max events **or** max bytes).
|
|
31
|
+
- **Two-stream model (locked)**:
|
|
32
|
+
- **Domain truth** lives in `events/*.jsonl` (typed domain events, ordered by `EventIndex`).
|
|
33
|
+
- A separate append-only **control stream** lives in `manifest.jsonl` (segment attestation + snapshot pins).
|
|
34
|
+
- **Segment manifest (append-only, authoritative)**:
|
|
35
|
+
- Segments are committed by appending a `segment_closed` record to `manifest.jsonl`.
|
|
36
|
+
- **Orphan segment rule**: any segment file without a corresponding `segment_closed` record is ignored (no salvage scanning).
|
|
37
|
+
- **Integrity + recovery**:
|
|
38
|
+
- On load, validate using `manifest.jsonl`; stop at the last valid manifest entry and fail explicitly (no guessing).
|
|
39
|
+
- Segment digests must match; otherwise treat the segment (and anything after, if contiguous loading) as invalid.
|
|
40
|
+
|
|
41
|
+
### Append transaction protocol (AppendPlan → segment → manifest) (locked)
|
|
42
|
+
To prevent drift and “partial truth” states, all durable mutation must occur through a single append transaction protocol.
|
|
43
|
+
|
|
44
|
+
Locks:
|
|
45
|
+
- The storage subsystem exposes a single durable mutation operation: `append(sessionId, plan: AppendPlan)`.
|
|
46
|
+
- `AppendPlan` is the **atomic unit** of durable truth for the domain stream:
|
|
47
|
+
- either the plan is fully committed (and becomes part of truth),
|
|
48
|
+
- or it has no effect on truth (orphan segment rule applies).
|
|
49
|
+
- Commit is deterministic and uses the two-stream model:
|
|
50
|
+
1) **Write domain events segment**:
|
|
51
|
+
- write the plan’s domain events to a new temp segment file under `events/` (JSONL, ordered by `EventIndex`)
|
|
52
|
+
- `fsync(file)` → `rename` to final `events/<first>-<last>.jsonl` → `fsync(dir)`
|
|
53
|
+
2) **Attest the segment in the control stream**:
|
|
54
|
+
- append `manifest.segment_closed` referencing the final segment rel path + sha256 digest + bounds
|
|
55
|
+
- `fsync(manifest)`
|
|
56
|
+
3) **Pin new snapshot refs (pin-after-close, locked)**:
|
|
57
|
+
- append `manifest.snapshot_pinned` records for any `snapshotRef` introduced by the plan’s committed domain events segment
|
|
58
|
+
- `fsync(manifest)`
|
|
59
|
+
- All of the above occurs while holding the session lock (`sessions/<sessionId>/.lock`).
|
|
60
|
+
- Orphan segment rule remains authoritative: any segment without a corresponding `segment_closed` is ignored and MUST NOT be scanned for salvage.
|
|
61
|
+
|
|
62
|
+
Crash-state intent (locked):
|
|
63
|
+
- Crash before step (2) → orphan segment ignored (no truth change).
|
|
64
|
+
- Crash after step (2) but before step (3) → committed segment exists, but pins may be missing. This is treated as corruption of the append transaction and MUST fail fast on load (no “pin-on-load” repair).
|
|
65
|
+
- Crash after step (3) → committed segment + pins exist (normal).
|
|
66
|
+
|
|
67
|
+
#### `manifest.jsonl` record ordering (locked)
|
|
68
|
+
- The manifest has its own monotonic per-session **`ManifestIndex`** (authoritative ordering for manifest records).
|
|
69
|
+
- Manifest records may reference the domain stream via `eventIndex` (e.g., `snapshot_pinned.eventIndex` refers to the associated `EventIndex`), but do not consume domain `EventIndex`.
|
|
70
|
+
|
|
71
|
+
#### `manifest.jsonl` record kinds (schemaVersion 1, locked)
|
|
72
|
+
`manifest.jsonl` is a closed-set discriminated union by `kind` (schemaVersion 1):
|
|
73
|
+
- `segment_closed`
|
|
74
|
+
- `snapshot_pinned`
|
|
75
|
+
|
|
76
|
+
##### `segment_closed` (locked)
|
|
77
|
+
Purpose: attest that an `events/*.jsonl` segment is durably committed and integrity-checked.
|
|
78
|
+
|
|
79
|
+
Required fields:
|
|
80
|
+
- `v` (schema version)
|
|
81
|
+
- `manifestIndex`
|
|
82
|
+
- `sessionId`
|
|
83
|
+
- `kind: "segment_closed"`
|
|
84
|
+
- `firstEventIndex`, `lastEventIndex`
|
|
85
|
+
- `segmentRelPath` (relative path; no absolute paths)
|
|
86
|
+
- `sha256` (digest of the segment file bytes)
|
|
87
|
+
- `bytes` (non-negative int)
|
|
88
|
+
|
|
89
|
+
Invariants:
|
|
90
|
+
- **Strict contiguity**: `firstEventIndex` must equal previous `lastEventIndex + 1` (no gaps, no overlaps).
|
|
91
|
+
- Segment contents must cover exactly [`firstEventIndex`, `lastEventIndex`] in increasing order.
|
|
92
|
+
- Segment is ignored unless its digest matches `sha256`.
|
|
93
|
+
|
|
94
|
+
Example:
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
{"v":1,"manifestIndex":12,"sessionId":"sess_01JH...","kind":"segment_closed","firstEventIndex":0,"lastEventIndex":4999,"segmentRelPath":"events/00000000-00004999.jsonl","sha256":"sha256:seg_7fd2...","bytes":1837421}
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
##### `snapshot_pinned` (locked)
|
|
101
|
+
Purpose: make snapshot reachability explicit for export/import and CAS GC (without scanning the event log).
|
|
102
|
+
|
|
103
|
+
Locks:
|
|
104
|
+
- **Pin-on-create**: record `snapshot_pinned` immediately when a new `snapshotRef` is introduced (as part of the append transaction; see pin-after-close ordering).
|
|
105
|
+
- Pins are append-only; duplicates are allowed; projections dedupe by `snapshotRef`.
|
|
106
|
+
|
|
107
|
+
Required fields:
|
|
108
|
+
- `v` (schema version)
|
|
109
|
+
- `manifestIndex`
|
|
110
|
+
- `sessionId`
|
|
111
|
+
- `kind: "snapshot_pinned"`
|
|
112
|
+
- `eventIndex` (associated domain `EventIndex`, typically the `node_created` that introduced the snapshot)
|
|
113
|
+
- `snapshotRef`
|
|
114
|
+
- `createdByEventId` (provenance)
|
|
115
|
+
|
|
116
|
+
Example:
|
|
117
|
+
|
|
118
|
+
```json
|
|
119
|
+
{"v":1,"manifestIndex":13,"sessionId":"sess_01JH...","kind":"snapshot_pinned","eventIndex":42,"snapshotRef":"sha256:snap_f2c1...","createdByEventId":"evt_01JH..."}
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Type-safety baseline
|
|
123
|
+
Avoid base primitives for identifiers where possible; use distinct branded/opaque types:
|
|
124
|
+
`SessionId`, `RunId`, `NodeId`, `EventId`, `WorkflowId`, `WorkflowHash`, `SnapshotRef`, `EventIndex`.
|
|
125
|
+
|
|
126
|
+
### Minimal internal event union (closed set)
|
|
127
|
+
All events share an envelope: `eventId`, `eventIndex`, `sessionId`, `kind`, plus optional scope refs (`runId?`, `nodeId?`).
|
|
128
|
+
|
|
129
|
+
Closed event kinds:
|
|
130
|
+
- `session_created`
|
|
131
|
+
- `observation_recorded` (session-first, node-scoped allowed but rare/high-signal)
|
|
132
|
+
- `run_started` (pins `workflowId` + `workflowHash`)
|
|
133
|
+
- `node_created` (`nodeKind`: `step|checkpoint|blocked_attempt`, references typed snapshot)
|
|
134
|
+
- `edge_created` (`edgeKind`: `acked_step|checkpoint`)
|
|
135
|
+
- `advance_recorded` (**durable result of an attempted advance (ack attempt)**, see below)
|
|
136
|
+
- `node_output_appended` (append-only durable write path; optional `supersedesOutputId?` for corrections without mutation)
|
|
137
|
+
- `preferences_changed` (node-scoped; stores delta + effective snapshot)
|
|
138
|
+
- `capability_observed` (node-scoped; includes closed-set provenance)
|
|
139
|
+
- `gap_recorded` (node-scoped; append-only “resolution” via linkage)
|
|
140
|
+
- `divergence_recorded` (node-scoped)
|
|
141
|
+
- `decision_trace_appended` (node-scoped, strictly bounded; never required for correctness)
|
|
142
|
+
|
|
143
|
+
### Event payload contracts (initial v2 schema, locked)
|
|
144
|
+
This section locks the *shape* of the highest-leverage event payloads so storage/projections don’t drift during implementation.
|
|
145
|
+
|
|
146
|
+
General rules:
|
|
147
|
+
- All identifiers use distinct branded types (`SessionId`, `RunId`, `NodeId`, `EventId`, `OutputId`, `GapId`, etc.).
|
|
148
|
+
- Prefer closed sets (discriminated unions/enums) over booleans and free-form strings.
|
|
149
|
+
- Node/run references live in the event envelope `scope` (avoid duplicating IDs inside event-specific payloads).
|
|
150
|
+
|
|
151
|
+
#### Idempotency via `dedupeKey` (locked)
|
|
152
|
+
WorkRail operates in a lossy/replay-prone environment. Every session event MUST carry a `dedupeKey` that enables safe retries and idempotent replays.
|
|
153
|
+
|
|
154
|
+
Behavior (locked):
|
|
155
|
+
- If an append encounters an existing event in the same session with the same `dedupeKey`, treat it as an **idempotent no-op**:
|
|
156
|
+
- do not append a new event
|
|
157
|
+
- return/ack success deterministically based on the existing event
|
|
158
|
+
|
|
159
|
+
Rules (locked intent):
|
|
160
|
+
- `dedupeKey` must be derived only from stable identifiers (no timestamps).
|
|
161
|
+
- `dedupeKey` is length-bounded and ASCII-safe.
|
|
162
|
+
- When incorporating a value-like field (e.g., an observation value), use a digest rather than embedding raw free-form text.
|
|
163
|
+
|
|
164
|
+
DedupeKey pattern (locked):
|
|
165
|
+
- Allowed characters: `[a-z0-9_:>-]+` (lowercase letters, digits, underscore, colon, greater-than, hyphen)
|
|
166
|
+
- Max length: 256 characters
|
|
167
|
+
- Recipe format: `<kind>:<parts joined by ":">`
|
|
168
|
+
- Arrow notation (`->`) allowed for edge relationships (e.g., `nodeA->nodeB`)
|
|
169
|
+
- MUST NOT contain uppercase letters, spaces, or other characters
|
|
170
|
+
|
|
171
|
+
**Hard rule (clarification, locked):** `dedupeKey` MUST NOT be derived from `eventId`. `eventId` is server-minted per append and is not available/stable across retries. If you need an idempotency handle, use a dedicated, typed identifier in the event payload (e.g., `outputId`, `changeId`, `observationId`, `gapId`, `attemptId`, `divergenceId`, `traceId`).
|
|
172
|
+
|
|
173
|
+
Initial v2 recipes (illustrative, locked intent):
|
|
174
|
+
- `run_started`: `run_started:<sessionId>:<runId>`
|
|
175
|
+
- `node_created`: `node_created:<sessionId>:<runId>:<nodeId>`
|
|
176
|
+
- `edge_created`: `edge_created:<sessionId>:<runId>:<fromNodeId>-><toNodeId>:<edgeKind>`
|
|
177
|
+
- `node_output_appended`: `node_output_appended:<sessionId>:<outputId>`
|
|
178
|
+
- `gap_recorded`: `gap_recorded:<sessionId>:<gapId>`
|
|
179
|
+
- `preferences_changed`: `preferences_changed:<sessionId>:<changeId>`
|
|
180
|
+
- `capability_observed`: `capability_observed:<sessionId>:<capObsId>`
|
|
181
|
+
- `advance_recorded`: `advance_recorded:<sessionId>:<nodeId>:<attemptId>`
|
|
182
|
+
- `divergence_recorded`: `divergence_recorded:<sessionId>:<divergenceId>`
|
|
183
|
+
- `decision_trace_appended`: `decision_trace_appended:<sessionId>:<traceId>`
|
|
184
|
+
- `observation_recorded`: `observation_recorded:<sessionId>:<key>:<valueDigest>`
|
|
185
|
+
|
|
186
|
+
#### `session_created` (locked)
|
|
187
|
+
Purpose: mark the existence of a session as durable truth without reintroducing mutable session documents.
|
|
188
|
+
|
|
189
|
+
Lock:
|
|
190
|
+
- `session_created` is a **marker event only** in the initial v2 schema (no additional metadata payload).
|
|
191
|
+
- Session “aboutness” for resumption/search is derived from durable observations, outputs, and run history (projections), not from session-level fields.
|
|
192
|
+
|
|
193
|
+
Envelope requirements:
|
|
194
|
+
- `scope` must be absent (`runId`/`nodeId` not allowed).
|
|
195
|
+
|
|
196
|
+
Example:
|
|
197
|
+
|
|
198
|
+
```json
|
|
199
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":0,"sessionId":"sess_01JH...","kind":"session_created","dedupeKey":"session_created:sess_01JH...","data":{}}
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
#### `run_started` (locked)
|
|
203
|
+
Purpose: introduce a run and pin it to a compiled workflow snapshot for deterministic execution.
|
|
204
|
+
|
|
205
|
+
Envelope requirements:
|
|
206
|
+
- `scope.runId` must be present.
|
|
207
|
+
- `scope.nodeId` must be absent.
|
|
208
|
+
|
|
209
|
+
Payload fields:
|
|
210
|
+
- `workflowId`
|
|
211
|
+
- `workflowHash`
|
|
212
|
+
- `workflowSourceKind`: `bundled | user | project | remote | plugin`
|
|
213
|
+
- `workflowSourceRef` (opaque; meaning is scoped by `workflowSourceKind`)
|
|
214
|
+
|
|
215
|
+
Invariants:
|
|
216
|
+
- `workflowHash` is the execution authority for the run (not `workflowSourceRef`).
|
|
217
|
+
- `workflowHash` must be resolvable to a persisted pinned compiled workflow snapshot (export/import requirement).
|
|
218
|
+
|
|
219
|
+
Example:
|
|
220
|
+
|
|
221
|
+
```json
|
|
222
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":1,"sessionId":"sess_01JH...","kind":"run_started","scope":{"runId":"run_01JH..."},"dedupeKey":"run_started:sess_01JH:run_01JH","data":{"workflowId":"project.bug_investigation_v2","workflowHash":"sha256:wf_9a3b...","workflowSourceKind":"project","workflowSourceRef":"workflows/bug_investigation_v2.json"}}
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
#### `node_created` (locked)
|
|
226
|
+
Purpose: create a durable node in the run DAG and link it to the immutable rehydration snapshot (`snapshotRef`).
|
|
227
|
+
|
|
228
|
+
Envelope requirements:
|
|
229
|
+
- `scope.runId` must be present.
|
|
230
|
+
- `scope.nodeId` must be present.
|
|
231
|
+
|
|
232
|
+
Payload fields:
|
|
233
|
+
- `nodeKind`: `step | checkpoint | blocked_attempt`
|
|
234
|
+
- `parentNodeId` (nullable only for the run root node; otherwise required)
|
|
235
|
+
- `workflowHash` (must match the run’s pinned `workflowHash`)
|
|
236
|
+
- `snapshotRef`
|
|
237
|
+
|
|
238
|
+
Invariants:
|
|
239
|
+
- `parentNodeId` (when present) must refer to a node in the same run.
|
|
240
|
+
- `snapshotRef` must be pinned in `manifest.jsonl` via `snapshot_pinned` (pin-on-create, pin-after-close ordering).
|
|
241
|
+
|
|
242
|
+
Example:
|
|
243
|
+
|
|
244
|
+
```json
|
|
245
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":2,"sessionId":"sess_01JH...","kind":"node_created","scope":{"runId":"run_01JH...","nodeId":"node_01JH_root"},"dedupeKey":"node_created:sess_01JH:run_01JH:node_01JH_root","data":{"nodeKind":"step","parentNodeId":null,"workflowHash":"sha256:wf_9a3b...","snapshotRef":"sha256:snap_f2c1..." }}
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
#### `preferences_changed` (locked)
|
|
249
|
+
Purpose: record a node-attached preference change in an append-only way that is rewind-safe and export/import safe.
|
|
250
|
+
|
|
251
|
+
Payload fields:
|
|
252
|
+
- `changeId` (stable identifier)
|
|
253
|
+
- `source`: `user | workflow_recommendation | system`
|
|
254
|
+
- `delta`: non-empty list of changes:
|
|
255
|
+
- each item is `{ key, value }`
|
|
256
|
+
- `key` is a closed set (`autonomy | riskPolicy`)
|
|
257
|
+
- no duplicate keys within a single delta
|
|
258
|
+
- `effective`: full effective preference snapshot after applying `delta` (node-attached truth)
|
|
259
|
+
|
|
260
|
+
Example:
|
|
261
|
+
|
|
262
|
+
```json
|
|
263
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":120,"sessionId":"sess_01JH...","kind":"preferences_changed","scope":{"runId":"run_01JH...","nodeId":"node_01JH..."},"dedupeKey":"preferences_changed:sess_01JH:prefchg_01JH...","data":{"changeId":"prefchg_01JH...","source":"user","delta":[{"key":"autonomy","value":"full_auto_never_stop"}],"effective":{"autonomy":"full_auto_never_stop","riskPolicy":"conservative"}}}
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
#### `capability_observed` (locked)
|
|
267
|
+
Purpose: record observed capability status with provenance so “agent said so” is never enforcement-grade truth by default.
|
|
268
|
+
|
|
269
|
+
Payload fields:
|
|
270
|
+
- `capObsId` (stable identifier; primary idempotency key)
|
|
271
|
+
- `capability`: `delegation | web_browsing`
|
|
272
|
+
- `status`: `unknown | available | unavailable`
|
|
273
|
+
- `provenance` (closed set):
|
|
274
|
+
- `kind: probe_step | attempted_use | manual_claim`
|
|
275
|
+
- `enforcementGrade`: `strong | weak` (derived deterministically from `kind`, but stored for projection/UI clarity)
|
|
276
|
+
- `detail` fields (minimal, explainability-first):
|
|
277
|
+
- `probe_step` (strong): `{ probeTemplateId, probeStepId, result: success|failure }`
|
|
278
|
+
- `attempted_use` (strong): `{ attemptContext: workflow_step|system_probe, result: success|failure, failureCode? }`
|
|
279
|
+
- `failureCode` is required iff `result=failure` and is a closed set: `tool_missing | tool_error | policy_blocked | unknown`
|
|
280
|
+
- `manual_claim` (weak): `{ claimedBy: agent|user, claim: available|unavailable }`
|
|
281
|
+
|
|
282
|
+
Example:
|
|
283
|
+
|
|
284
|
+
```json
|
|
285
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":121,"sessionId":"sess_01JH...","kind":"capability_observed","scope":{"runId":"run_01JH...","nodeId":"node_01JH..."},"dedupeKey":"capability_observed:sess_01JH:capobs_01JH...","data":{"capObsId":"capobs_01JH...","capability":"web_browsing","status":"available","provenance":{"kind":"probe_step","enforcementGrade":"strong","detail":{"probeTemplateId":"wr.templates.capability_probe","probeStepId":"wr_probe_web_browsing","result":"success"}}}}
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
**Projection rule (locked intent):** capability status is **derived**.
|
|
289
|
+
- History is append-only: multiple `capability_observed` events may exist for the same `(nodeId, capability)`.
|
|
290
|
+
- The “current” status for a node is the latest event by `EventIndex` for that `(nodeId, capability)` (ties impossible by ordering).
|
|
291
|
+
- `dedupeKey` prevents only true retries (same `capObsId`), not legitimate status evolution.
|
|
292
|
+
|
|
293
|
+
#### `advance_recorded` (initial v2 schema, locked)
|
|
294
|
+
Purpose: record the durable outcome of an attempted `continue_workflow` operation so Studio and exports never infer “what happened” from transient tool responses.
|
|
295
|
+
|
|
296
|
+
Locks:
|
|
297
|
+
- `advance_recorded` is the canonical durable record for **ack attempts** (attempted advancement), including **blocked** and **advanced** outcomes.
|
|
298
|
+
- Rehydrate-only is side-effect-free (see contract): `continue_workflow` without `ackToken` MUST NOT create durable events, therefore it MUST NOT create `advance_recorded`.
|
|
299
|
+
- Idempotency is keyed by `attemptId` (not by tokens, timestamps, or `eventId`).
|
|
300
|
+
- `advance_recorded` is **node-scoped** (the node the agent attempted to operate on).
|
|
301
|
+
- `advance_recorded.dedupeKey` MUST be scoped by node: `advance_recorded:<sessionId>:<nodeId>:<attemptId>`. This prevents catastrophic false-dedupe if an `attemptId` is accidentally reused on a different node.
|
|
302
|
+
|
|
303
|
+
Payload fields:
|
|
304
|
+
- `attemptId` (stable identifier; primary idempotency key; matches `ackToken` payload field name)
|
|
305
|
+
- `intent` (closed set):
|
|
306
|
+
- `ack_pending` (attempt to advance using `ackToken`)
|
|
307
|
+
- `outcome` (closed-set discriminated union):
|
|
308
|
+
- `{ kind: "blocked", blockers: BlockerReport }`
|
|
309
|
+
- `{ kind: "advanced", toNodeId }`
|
|
310
|
+
|
|
311
|
+
#### `BlockerReport` (initial v2 schema, locked)
|
|
312
|
+
Purpose: represent “blocked” reasons as typed, deterministic errors-as-data so Studio/exports never infer from chat history.
|
|
313
|
+
|
|
314
|
+
Locks:
|
|
315
|
+
- `BlockerReport` is a closed-set structure (no free-form codes).
|
|
316
|
+
- Ordering is deterministic.
|
|
317
|
+
- Payloads are bounded by byte budgets.
|
|
318
|
+
|
|
319
|
+
Text budgets are UTF-8 bytes (locked):
|
|
320
|
+
- All text budget limits (e.g., `message`, `suggestedFix`, `summary`, `notesMarkdown`) are measured in **UTF-8 bytes**, not code units or characters.
|
|
321
|
+
- Validation MUST use UTF-8 byte length measurement (e.g., `TextEncoder.encode(s).length`), not string `.length`.
|
|
322
|
+
- This prevents multi-byte character edge cases and ensures consistent enforcement across runtimes.
|
|
323
|
+
|
|
324
|
+
Shape (conceptual):
|
|
325
|
+
- `blockers: [Blocker, ...]` (non-empty)
|
|
326
|
+
- `Blocker` fields (required):
|
|
327
|
+
- `code` (closed set; see below)
|
|
328
|
+
- `pointer` (typed pointer; see below)
|
|
329
|
+
- `message` (bounded text)
|
|
330
|
+
- `suggestedFix` (bounded text; optional but strongly recommended)
|
|
331
|
+
|
|
332
|
+
`Blocker.code` (closed set, initial; derived from `ReasonCode`):
|
|
333
|
+
- `USER_ONLY_DEPENDENCY`
|
|
334
|
+
- `MISSING_REQUIRED_OUTPUT`
|
|
335
|
+
- `INVALID_REQUIRED_OUTPUT`
|
|
336
|
+
- `REQUIRED_CAPABILITY_UNKNOWN`
|
|
337
|
+
- `REQUIRED_CAPABILITY_UNAVAILABLE`
|
|
338
|
+
- `INVARIANT_VIOLATION`
|
|
339
|
+
- `STORAGE_CORRUPTION_DETECTED`
|
|
340
|
+
|
|
341
|
+
`Blocker.pointer` (closed set, initial):
|
|
342
|
+
- `{ kind: "context_key", key: string }` (use only for declared external inputs; key must be delimiter-safe)
|
|
343
|
+
- `{ kind: "context_budget" }` (request context exceeded byte budget or was non-serializable; see Context budget lock)
|
|
344
|
+
- `{ kind: "output_contract", contractRef: string }`
|
|
345
|
+
- `{ kind: "capability", capability: "delegation" | "web_browsing" }`
|
|
346
|
+
- `{ kind: "workflow_step", stepId: string }` (stepId must be delimiter-safe)
|
|
347
|
+
|
|
348
|
+
Blocker pointer identifiers (locked):
|
|
349
|
+
- `context_key.key` MUST be delimiter-safe: `[a-z0-9_-]+`
|
|
350
|
+
- `workflow_step.stepId` MUST be delimiter-safe: `[a-z0-9_-]+`
|
|
351
|
+
- This ensures consistency with StepInstanceKey encoding and prevents serialization edge cases.
|
|
352
|
+
|
|
353
|
+
Budgets (locked):
|
|
354
|
+
- max blockers per report: 10
|
|
355
|
+
- max bytes per `message`: 512
|
|
356
|
+
- max bytes per `suggestedFix`: 1024
|
|
357
|
+
- if a budget would be exceeded: fail fast during validation (do not truncate silently)
|
|
358
|
+
|
|
359
|
+
Deterministic ordering (locked):
|
|
360
|
+
- sort blockers by `(code, pointer.kind, pointer.* stable fields)` in ascending lexical order before returning/storing.
|
|
361
|
+
|
|
362
|
+
Mapping lock (ReasonCode → Blocker.code) (locked):
|
|
363
|
+
- `ReasonCode.user_only_dependency:*` → `USER_ONLY_DEPENDENCY`
|
|
364
|
+
- `ReasonCode.contract_violation:missing_required_output` → `MISSING_REQUIRED_OUTPUT`
|
|
365
|
+
- `ReasonCode.contract_violation:invalid_required_output` → `INVALID_REQUIRED_OUTPUT`
|
|
366
|
+
- `ReasonCode.capability_missing:required_capability_unknown` → `REQUIRED_CAPABILITY_UNKNOWN`
|
|
367
|
+
- `ReasonCode.capability_missing:required_capability_unavailable` → `REQUIRED_CAPABILITY_UNAVAILABLE`
|
|
368
|
+
- `ReasonCode.unexpected:invariant_violation` → `INVARIANT_VIOLATION`
|
|
369
|
+
- `ReasonCode.unexpected:storage_corruption_detected` → `STORAGE_CORRUPTION_DETECTED`
|
|
370
|
+
|
|
371
|
+
Notes:
|
|
372
|
+
- When `outcome.kind == "advanced"`, an `edge_created` + `node_created` MUST also exist for the same logical operation (either in the same append plan or as the idempotent replay result).
|
|
373
|
+
- When `outcome.kind == "blocked"`, the run status projection can rely on `gap_recorded` (for never-stop) and/or blockers here (for UX). Blockers are errors-as-data and must be bounded.
|
|
374
|
+
|
|
375
|
+
#### `divergence_recorded` (initial v2 schema, locked)
|
|
376
|
+
Purpose: record intentional off-script behavior as a durable, node-attached explainability signal (Studio badges; not required for correctness).
|
|
377
|
+
|
|
378
|
+
Envelope requirements:
|
|
379
|
+
- `scope.runId` must be present.
|
|
380
|
+
- `scope.nodeId` must be present.
|
|
381
|
+
|
|
382
|
+
Payload fields:
|
|
383
|
+
- `divergenceId` (stable identifier; primary idempotency key)
|
|
384
|
+
- `reason` (closed set, initial):
|
|
385
|
+
- `missing_user_context`
|
|
386
|
+
- `capability_unavailable`
|
|
387
|
+
- `efficiency_skip`
|
|
388
|
+
- `safety_stop`
|
|
389
|
+
- `policy_constraint`
|
|
390
|
+
- `summary` (bounded text; non-empty)
|
|
391
|
+
- `relatedStepId?` (optional step id string)
|
|
392
|
+
|
|
393
|
+
Example:
|
|
394
|
+
|
|
395
|
+
```json
|
|
396
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":126,"sessionId":"sess_01JH...","kind":"divergence_recorded","scope":{"runId":"run_01JH...","nodeId":"node_01JH..."},"dedupeKey":"divergence_recorded:sess_01JH:div_01JH...","data":{"divergenceId":"div_01JH...","reason":"capability_unavailable","summary":"Delegation was unavailable; executed sequentially and recorded results.","relatedStepId":"investigate"}}
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
#### `gap_recorded` (locked)
|
|
400
|
+
Purpose: durable disclosure primitive for never-stop behavior and explainability in blocking modes. Gaps are immutable; “resolution” is linkage.
|
|
401
|
+
|
|
402
|
+
Payload fields:
|
|
403
|
+
- `gapId` (stable identifier)
|
|
404
|
+
- `severity`: `info | warning | critical`
|
|
405
|
+
- `reason` (category + category-specific closed detail):
|
|
406
|
+
- category: `user_only_dependency | contract_violation | capability_missing | unexpected`
|
|
407
|
+
- detail enums (initial):
|
|
408
|
+
- `user_only_dependency`: see `UserOnlyDependencyReason` below
|
|
409
|
+
- `contract_violation`: `missing_required_output | invalid_required_output`
|
|
410
|
+
- `capability_missing`: `required_capability_unavailable | required_capability_unknown`
|
|
411
|
+
- `unexpected`: `invariant_violation | storage_corruption_detected`
|
|
412
|
+
- `summary` (bounded text)
|
|
413
|
+
- `resolution`: `{ kind: unresolved } | { kind: resolves, resolvesGapId }`
|
|
414
|
+
- `evidenceRefs` (optional, closed set):
|
|
415
|
+
- `{ kind: event, eventId }`
|
|
416
|
+
- `{ kind: output, outputId }`
|
|
417
|
+
|
|
418
|
+
Example:
|
|
419
|
+
|
|
420
|
+
```json
|
|
421
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":122,"sessionId":"sess_01JH...","kind":"gap_recorded","scope":{"runId":"run_01JH...","nodeId":"node_01JH..."},"dedupeKey":"gap_recorded:sess_01JH:gap_01JH...","data":{"gapId":"gap_01JH...","severity":"critical","reason":{"category":"contract_violation","detail":"missing_required_output"},"summary":"Required capability observation output was missing; continuing in never-stop mode.","resolution":{"kind":"unresolved"},"evidenceRefs":[{"kind":"event","eventId":"evt_01JH..."}]}}
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
#### `decision_trace_appended` (initial v2 schema, locked)
|
|
425
|
+
Purpose: bounded “why” trace for debugging/audit without relying on the chat transcript. Collapsed by default in Studio; never required for correctness.
|
|
426
|
+
|
|
427
|
+
Envelope requirements:
|
|
428
|
+
- `scope.runId` must be present.
|
|
429
|
+
- `scope.nodeId` must be present.
|
|
430
|
+
|
|
431
|
+
Payload fields:
|
|
432
|
+
- `traceId` (stable identifier; primary idempotency key)
|
|
433
|
+
- `entries` (non-empty list, bounded)
|
|
434
|
+
- each entry has:
|
|
435
|
+
- `kind` (closed set, initial):
|
|
436
|
+
- `selected_next_step`
|
|
437
|
+
- `evaluated_condition`
|
|
438
|
+
- `entered_loop`
|
|
439
|
+
- `exited_loop`
|
|
440
|
+
- `detected_non_tip_advance`
|
|
441
|
+
- `summary` (bounded text, UTF-8 bytes)
|
|
442
|
+
- `refs?` (optional; closed union, not an open bag)
|
|
443
|
+
|
|
444
|
+
Decision trace refs (locked):
|
|
445
|
+
- `refs` is a **closed-set discriminated union** by `kind`, not `record<unknown>`.
|
|
446
|
+
- Allowed ref kinds (initial):
|
|
447
|
+
- `{ kind: "step_id", stepId: string }` — references a workflow step
|
|
448
|
+
- `{ kind: "loop_id", loopId: string }` — references a loop
|
|
449
|
+
- `{ kind: "condition_id", conditionId: string }` — references a condition
|
|
450
|
+
- `{ kind: "iteration", value: number }` — references an iteration (0-based)
|
|
451
|
+
- All `stepId`, `loopId`, `conditionId` must be **delimiter-safe**: `[a-z0-9_-]+`
|
|
452
|
+
- Max refs per entry: 10
|
|
453
|
+
- New ref kinds require an explicit schema version bump or union extension.
|
|
454
|
+
|
|
455
|
+
Budgets (locked):
|
|
456
|
+
- max entries: 25
|
|
457
|
+
- max summary bytes per entry: 512
|
|
458
|
+
- max total bytes per event: 8192
|
|
459
|
+
- if budgets are exceeded, deterministically truncate by bytes and append the canonical truncation marker to the affected summary (never drop entries out of order).
|
|
460
|
+
|
|
461
|
+
Loop trace completeness (locked intent):
|
|
462
|
+
- For any `type:"loop"` execution:
|
|
463
|
+
- the engine MUST record `entered_loop` at loop entry,
|
|
464
|
+
- MUST record `evaluated_condition` for each loop condition evaluation that influences control flow,
|
|
465
|
+
- and MUST record `exited_loop` when the loop terminates.
|
|
466
|
+
- These trace entries SHOULD include `refs: { loopId }` (and `iteration?` when applicable) so “loop did not run” and “why did it exit” are diagnosable without inference.
|
|
467
|
+
- If a loop terminates without running its body (0 iterations), `evaluated_condition` + `exited_loop` MUST still be recorded (no silent short-circuit).
|
|
468
|
+
|
|
469
|
+
Canonical truncation marker (locked):
|
|
470
|
+
- append exactly: `\n\n[TRUNCATED]`
|
|
471
|
+
- truncation is byte-based (UTF-8). To guarantee the marker fits, reserve marker bytes and truncate the original text prefix to the remaining budget.
|
|
472
|
+
|
|
473
|
+
Example:
|
|
474
|
+
|
|
475
|
+
```json
|
|
476
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":127,"sessionId":"sess_01JH...","kind":"decision_trace_appended","scope":{"runId":"run_01JH...","nodeId":"node_01JH..."},"dedupeKey":"decision_trace_appended:sess_01JH:trace_01JH...","data":{"traceId":"trace_01JH...","entries":[{"kind":"selected_next_step","summary":"Chose step 'investigate' because prior evidence reduced uncertainty most.","refs":{"stepId":"investigate"}},{"kind":"detected_non_tip_advance","summary":"Provided state token was non-tip; recording fork marker and continuing on a new branch."}]}}
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
#### `edge_created` (locked)
|
|
480
|
+
Purpose: record authoritative relationships between nodes in a run DAG (advancement and explicit fork-from-non-tip markers).
|
|
481
|
+
|
|
482
|
+
Payload fields:
|
|
483
|
+
- `edgeKind`: `acked_step | checkpoint`
|
|
484
|
+
- `fromNodeId`, `toNodeId`
|
|
485
|
+
- `cause`:
|
|
486
|
+
- `kind`: `idempotent_replay | intentional_fork | non_tip_advance | checkpoint_created`
|
|
487
|
+
- `eventId` (required for explainability; references an event in this session)
|
|
488
|
+
|
|
489
|
+
Invariants:
|
|
490
|
+
- `fromNodeId` and `toNodeId` must refer to nodes in the same run.
|
|
491
|
+
- For `edgeKind=acked_step`:
|
|
492
|
+
- `toNodeId` must have `parentNodeId == fromNodeId`.
|
|
493
|
+
- `cause.kind` must be `idempotent_replay` or `intentional_fork` or `non_tip_advance`.
|
|
494
|
+
- For `edgeKind=checkpoint`:
|
|
495
|
+
- `toNodeId` must have `parentNodeId == fromNodeId`.
|
|
496
|
+
- `toNodeId` must refer to a node with `nodeKind == checkpoint`.
|
|
497
|
+
- `cause.kind` must be `checkpoint_created`.
|
|
498
|
+
|
|
499
|
+
Lock (simplification): do not model fork-from-non-tip as a separate edge kind. Fork-ness is represented via `cause.kind=non_tip_advance` on the normal `acked_step` edge and derived via projections/Studio badges.
|
|
500
|
+
|
|
501
|
+
Example:
|
|
502
|
+
|
|
503
|
+
```json
|
|
504
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":123,"sessionId":"sess_01JH...","kind":"edge_created","scope":{"runId":"run_01JH..."},"dedupeKey":"edge_created:sess_01JH:run_01JH:node_A->node_B:acked_step","data":{"edgeKind":"acked_step","fromNodeId":"node_A","toNodeId":"node_B","cause":{"kind":"intentional_fork","eventId":"evt_01JH..."}}}
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
### Durable outputs: append + supersede linkage (locked)
|
|
508
|
+
Durable outputs exist to preserve high-signal progress outside the chat transcript without introducing mutable “documents” that drift under rewinds.
|
|
509
|
+
|
|
510
|
+
Locks:
|
|
511
|
+
- Outputs are recorded only via **append-only** `node_output_appended` events (no in-place edits).
|
|
512
|
+
- Each output append assigns a stable **`outputId`** (server-owned identifier for idempotency and explainability).
|
|
513
|
+
- Corrections use **linkage**, not mutation:
|
|
514
|
+
- `supersedesOutputId` is optional and indicates “this output corrects/replaces an earlier output”.
|
|
515
|
+
- `supersedesOutputId` is **node-scoped**: it may only reference outputs from the same `nodeId`.
|
|
516
|
+
- Output typing is a **closed set**:
|
|
517
|
+
- minimal: `notesMarkdown` (text-first)
|
|
518
|
+
- structured: `outputKind` (closed set) and/or `contractRef` (WorkRail-owned contract pack reference)
|
|
519
|
+
- “Current view” is a projection:
|
|
520
|
+
- an output is considered “superseded” if any later `node_output_appended` on the same node references it.
|
|
521
|
+
- history remains visible; the “current” set is derived.
|
|
522
|
+
|
|
523
|
+
#### `node_output_appended` payload (initial v2 schema, locked)
|
|
524
|
+
Purpose: the single durable write path for high-signal progress and optional structured artifacts, attached to a node.
|
|
525
|
+
|
|
526
|
+
Locks:
|
|
527
|
+
- Outputs are append-only facts. Corrections use `supersedesOutputId` linkage (no mutation).
|
|
528
|
+
- `supersedesOutputId` is node-scoped: it may only reference outputs from the same `nodeId`.
|
|
529
|
+
- Corrections are channel-scoped: `supersedesOutputId` may only reference an output with the same `outputChannel`.
|
|
530
|
+
- Output payload is a closed set (initial v2 schema).
|
|
531
|
+
- **Deterministic expansion + ordering (locked):** if a single logical operation produces multiple outputs for a node (e.g., multiple `artifact_ref` entries), WorkRail MUST append outputs in a deterministic order:
|
|
532
|
+
- at most one `outputChannel=recap` output first (if produced),
|
|
533
|
+
- then `outputChannel=artifact` outputs ordered by `(sha256, contentType)` ascending (lexical).
|
|
534
|
+
- **Deterministic outputId derivation (locked intent):** `outputId` must be stable under retries. When an output is produced as part of an ack attempt, the `outputId` MUST be deterministically derived from the attempt identity and the payload discriminator (do not mint random IDs). The specific string encoding is intentionally opaque and versioned; only the derivation inputs are locked.
|
|
535
|
+
|
|
536
|
+
Payload fields:
|
|
537
|
+
- `outputId` (stable identifier; primary idempotency key)
|
|
538
|
+
- `supersedesOutputId?`
|
|
539
|
+
- `outputChannel` (closed set):
|
|
540
|
+
- `recap` (default “what happened / what’s next”)
|
|
541
|
+
- `artifact` (structured results referenced by digest)
|
|
542
|
+
- `payload` (closed-set discriminated union by `payloadKind`):
|
|
543
|
+
- `notes`:
|
|
544
|
+
- `{ payloadKind: "notes", notesMarkdown }`
|
|
545
|
+
- `artifact_ref`:
|
|
546
|
+
- `{ payloadKind: "artifact_ref", sha256, contentType, byteLength }`
|
|
547
|
+
- `sha256` is a digest of the artifact bytes; artifacts live in the durable artifact store and are referenced, not duplicated into events.
|
|
548
|
+
|
|
549
|
+
Budgets (locked):
|
|
550
|
+
- `notesMarkdown` max bytes: 4096
|
|
551
|
+
- if exceeded, deterministically truncate by bytes (preserving the beginning of the text) and append the canonical truncation marker: `\n\n[TRUNCATED]`.
|
|
552
|
+
|
|
553
|
+
Example:
|
|
554
|
+
|
|
555
|
+
```json
|
|
556
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":124,"sessionId":"sess_01JH...","kind":"node_output_appended","scope":{"runId":"run_01JH...","nodeId":"node_01JH..."},"dedupeKey":"node_output_appended:sess_01JH:out_01JH...","data":{"outputId":"out_01JH...","outputChannel":"recap","payload":{"payloadKind":"notes","notesMarkdown":"Completed Phase 1. Next: probe web browsing capability; record observations."}}}
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
#### `observation_recorded` (initial v2 schema, locked)
|
|
560
|
+
Purpose: record high-signal workspace identity anchors for deterministic resume/search and explainability (not telemetry).
|
|
561
|
+
|
|
562
|
+
Locks:
|
|
563
|
+
- Session-scoped by default (no `scope.runId` / `scope.nodeId`).
|
|
564
|
+
- Closed-set keys + tagged scalar values; “latest” is a projection by max `EventIndex` per key.
|
|
565
|
+
|
|
566
|
+
Payload fields:
|
|
567
|
+
- `key` (closed set, initial):
|
|
568
|
+
- `git_branch`
|
|
569
|
+
- `git_head_sha`
|
|
570
|
+
- `repo_root_hash`
|
|
571
|
+
- `repo_root` — human-readable absolute repo path; stored alongside `repo_root_hash` for console grouping without reverse-hash lookup
|
|
572
|
+
- `value` (tagged scalar closed set, initial):
|
|
573
|
+
- `{ type: "short_string", value }` (bounded)
|
|
574
|
+
- `{ type: "git_sha1", value }`
|
|
575
|
+
- `{ type: "sha256", value }`
|
|
576
|
+
- `{ type: "path", value }` — filesystem path; higher length budget than `short_string`
|
|
577
|
+
- `confidence`: `low | med | high`
|
|
578
|
+
|
|
579
|
+
Budgets (locked):
|
|
580
|
+
- for `value.type="short_string"`, max length: 80
|
|
581
|
+
- for `value.type="path"`, max length: 512
|
|
582
|
+
|
|
583
|
+
Example:
|
|
584
|
+
|
|
585
|
+
```json
|
|
586
|
+
{"v":1,"eventId":"evt_01JH...","eventIndex":125,"sessionId":"sess_01JH...","kind":"observation_recorded","dedupeKey":"observation_recorded:sess_01JH:git_head_sha:4f3c...","data":{"key":"git_head_sha","value":{"type":"git_sha1","value":"4f3c2a1b0d9e8f7a6b5c4d3e2f1a0b9c8d7e6f5a"},"confidence":"high"}}
|
|
587
|
+
```
|
|
588
|
+
|
|
589
|
+
---
|
|
590
|
+
|
|
591
|
+
## 2.1) Projection contracts (initial v2, locked)
|
|
592
|
+
Projections drive Studio/Console and exports. They MUST be deterministic and derived from durable truth (event log + snapshots + node-attached preferences).
|
|
593
|
+
|
|
594
|
+
### Current outputs projection (locked)
|
|
595
|
+
For a given `nodeId`, the “current” output for a channel is the latest output in that channel that is not superseded by a later output on the same node.
|
|
596
|
+
|
|
597
|
+
Rules:
|
|
598
|
+
- `supersedesOutputId` is node-scoped and channel-scoped.
|
|
599
|
+
- A channel’s history remains visible; “current” is derived.
|
|
600
|
+
|
|
601
|
+
### Run status projection (locked)
|
|
602
|
+
Default run status is computed from the **preferred tip** node (per the preferred tip policy).
|
|
603
|
+
|
|
604
|
+
Let:
|
|
605
|
+
- `autonomy` be the effective preference snapshot at the preferred tip node
|
|
606
|
+
- `isComplete` be derived from the preferred tip snapshot execution state (`complete` vs `init|running`)
|
|
607
|
+
- `hasUnresolvedCriticalGaps` be true iff the run has any unresolved `gap_recorded` with `severity=critical`
|
|
608
|
+
|
|
609
|
+
Status:
|
|
610
|
+
- If `isComplete`:
|
|
611
|
+
- If `hasUnresolvedCriticalGaps`: `complete_with_gaps`
|
|
612
|
+
- Else: `complete`
|
|
613
|
+
- Else (not complete):
|
|
614
|
+
- If `autonomy != full_auto_never_stop` and the preferred tip node has any unresolved critical gap in categories `user_only_dependency | contract_violation | capability_missing`: `blocked`
|
|
615
|
+
- Else: `in_progress`
|
|
616
|
+
|
|
617
|
+
---
|
|
618
|
+
|
|
619
|
+
## 2.2) Retention + CAS GC (initial v2, locked)
|
|
620
|
+
WorkRail is local-only by default. Retention and deletion must be safe and deterministic: never delete reachable durable truth.
|
|
621
|
+
|
|
622
|
+
### Session retention (locked intent)
|
|
623
|
+
- Sessions are eligible for deletion after a configurable TTL (recommended default: 30–90 days).
|
|
624
|
+
- Some sessions may be explicitly kept (exempt from TTL) via a WorkRail-owned config/control mechanism (closed set).
|
|
625
|
+
|
|
626
|
+
### CAS snapshot GC (locked)
|
|
627
|
+
Use mark-and-sweep GC rooted in per-session pin lists:
|
|
628
|
+
- GC roots: the union of `snapshotRef`s recorded in each retained session’s `manifest.jsonl` (`snapshot_pinned`).
|
|
629
|
+
- Reachability is derived; no inference or scanning of event segments is required for GC correctness.
|
|
630
|
+
- A CAS snapshot may be deleted only if it is not reachable from any retained session’s roots.
|
|
631
|
+
|
|
632
|
+
### Safety invariants (locked)
|
|
633
|
+
- GC runs only after manifests for all retained sessions are loaded and validated.
|
|
634
|
+
- On any manifest corruption or unknown schema version, GC enters **safe mode**: no deletes; emit structured warning.
|
|
635
|
+
- Deletion order: delete session storage first (event segments/manifest), then run CAS GC. Never the reverse.
|
|
636
|
+
|
|
637
|
+
---
|
|
638
|
+
|
|
639
|
+
## 2.3) `resume_session` deterministic ranking (initial v2, locked)
|
|
640
|
+
`resume_session` exists to reduce friction in brand-new chat scenarios where the user does not have a token. Because WorkRail cannot read chat history, ranking MUST be derived from durable truth and be deterministic.
|
|
641
|
+
|
|
642
|
+
Locks:
|
|
643
|
+
- Results are **tip-only** (preferred tip node per run).
|
|
644
|
+
- Ranking is deterministic for a given durable store state.
|
|
645
|
+
- Responses are bounded (no unbounded history dumps).
|
|
646
|
+
|
|
647
|
+
### Ranking algorithm (locked)
|
|
648
|
+
Use strict tiered matching (layered search as ordering), not probabilistic scoring.
|
|
649
|
+
|
|
650
|
+
Tier order (highest to lowest):
|
|
651
|
+
1) exact match on `git_head_sha` observation
|
|
652
|
+
2) exact or prefix match on `git_branch` observation
|
|
653
|
+
3) token match on latest preferred-tip `node_output_appended` recap notes (`outputChannel=recap`, `payloadKind=notes`) using the locked normalization rules below
|
|
654
|
+
4) token match on workflow id/name (source/compiled metadata) using the locked normalization rules below
|
|
655
|
+
5) fallback to recency only
|
|
656
|
+
|
|
657
|
+
Within a tier, order by:
|
|
658
|
+
1) run preferred-tip `lastActivityEventIndex` desc
|
|
659
|
+
2) `sessionId` lex (deterministic tie-breaker)
|
|
660
|
+
|
|
661
|
+
### Response budget (locked)
|
|
662
|
+
- max candidates: 5
|
|
663
|
+
- max snippet bytes per candidate: 2048 (UTF-8, with canonical truncation marker)
|
|
664
|
+
|
|
665
|
+
### Match explanations (locked intent)
|
|
666
|
+
Each candidate includes a closed-set `whyMatched[]`, e.g.:
|
|
667
|
+
- `matched_head_sha`
|
|
668
|
+
- `matched_branch`
|
|
669
|
+
- `matched_notes`
|
|
670
|
+
- `matched_workflow_id`
|
|
671
|
+
- `recency_fallback`
|
|
672
|
+
|
|
673
|
+
### Text matching semantics (locked)
|
|
674
|
+
To prevent cross-implementation drift, any “match on text” in resume ranking uses a single deterministic normalization and token matching policy.
|
|
675
|
+
|
|
676
|
+
Locks:
|
|
677
|
+
- Normalize both query and candidate text by:
|
|
678
|
+
1) Unicode normalization: **NFKC**
|
|
679
|
+
2) Lowercase (locale-independent)
|
|
680
|
+
3) Extract tokens matching regex: `[a-z0-9_-]+`
|
|
681
|
+
- A candidate “matches notes” iff **all query tokens** appear in the candidate token set (set membership; not raw substring).
|
|
682
|
+
- The searchable text corpus excludes:
|
|
683
|
+
- the canonical truncation marker `\n\n[TRUNCATED]`
|
|
684
|
+
- superseded outputs (only “current recap” output for the preferred tip is considered)
|
|
685
|
+
|
|
686
|
+
|
|
687
|
+
### Node snapshots: typed, versioned, minimal
|
|
688
|
+
Snapshots must be typed+versioned (not opaque blobs) and must not become a second engine:
|
|
689
|
+
- include only minimal rehydration payload + `workflowHash` linkage
|
|
690
|
+
- verifiable against the pinned compiled workflow snapshot
|
|
691
|
+
- portable for export/import; tokens are re-minted from snapshots
|
|
692
|
+
|
|
693
|
+
#### Snapshot execution payload boundary (initial v2 schema, locked)
|
|
694
|
+
Snapshots store the minimum typed interpreter state required to rehydrate execution deterministically and re-mint runtime tokens, without replaying the event log and without caching projections.
|
|
695
|
+
|
|
696
|
+
Locks:
|
|
697
|
+
- Snapshot payload uses a discriminated union (no booleans-as-state).
|
|
698
|
+
- Pending-step presence is explicit (no nullable state):
|
|
699
|
+
- `pending: { kind: "none" } | { kind: "some"; step: PendingStep }`
|
|
700
|
+
- Impossible state is rejected:
|
|
701
|
+
- when `pending.kind == "some"`, the pending step instance key MUST NOT be present in `completed`.
|
|
702
|
+
- Loop IDs are unique in the compiled workflow:
|
|
703
|
+
- the runtime loop stack must not contain the same `loopId` twice.
|
|
704
|
+
- Pending loop path must exactly match the loop stack (loopId+iteration):
|
|
705
|
+
- when `pending.kind == "some"`, `pending.step.loopPath == loopStack.map(loopId, iteration)`.
|
|
706
|
+
- Completed step instances are represented as an explicit set wrapper (not a raw array) and must be sorted lexicographically by key.
|
|
707
|
+
|
|
708
|
+
##### `StepInstanceKey` canonical format (locked)
|
|
709
|
+
To avoid escaping footguns, the canonical key format assumes `stepId` and `loopId` are constrained to a delimiter-safe charset.
|
|
710
|
+
|
|
711
|
+
Constraints:
|
|
712
|
+
- `stepId` and `loopId` use only: `[a-z0-9_-]+` (lowercase letters, digits, underscore, hyphen)
|
|
713
|
+
- explicitly disallowed: `@`, `/`, `:`
|
|
714
|
+
|
|
715
|
+
Format:
|
|
716
|
+
- If `loopPath` is empty: `StepInstanceKey = stepId`
|
|
717
|
+
- Else: `StepInstanceKey = (loopId@iteration joined by "/") + "::" + stepId`
|
|
718
|
+
|
|
719
|
+
Example:
|
|
720
|
+
- `outer@0/inner@2::triage`
|
|
721
|
+
|
|
722
|
+
---
|
|
723
|
+
|
|
724
|
+
## 1.2) Token boundary locks (opaque, signed refs) (locked)
|
|
725
|
+
WorkRail v2 uses tokens as opaque handles at the MCP boundary. Tokens are **not** durable truth; they are tamper-evident references into the append-only store.
|
|
726
|
+
|
|
727
|
+
### Token architecture (locked)
|
|
728
|
+
- Tokens are **signed refs** to durable truth (no server-side token table).
|
|
729
|
+
- Durable truth remains the event log + snapshots; tokens are re-minted on import.
|
|
730
|
+
|
|
731
|
+
### Token string encoding (locked intent)
|
|
732
|
+
- `stateToken` format: `st.v1.<payload>.<sig>`
|
|
733
|
+
- `ackToken` format: `ack.v1.<payload>.<sig>`
|
|
734
|
+
- `checkpointToken` format: `chk.v1.<payload>.<sig>`
|
|
735
|
+
- `<payload>` is base64url of **RFC 8785 (JCS)** canonical JSON containing only the locked fields.
|
|
736
|
+
|
|
737
|
+
### Token signing + keyring (locked)
|
|
738
|
+
To prevent cross-implementation drift and keep validation deterministic, v2 locks the signing algorithm and keyring semantics.
|
|
739
|
+
|
|
740
|
+
Locks:
|
|
741
|
+
- **Signing algorithm**: `HMAC-SHA256`
|
|
742
|
+
- **Signing key material**:
|
|
743
|
+
- 32-byte random key
|
|
744
|
+
- stored in a local WorkRail-owned keyring file under the data directory (`keys/keyring.json`)
|
|
745
|
+
- **Signature input bytes (locked)**:
|
|
746
|
+
- `<sig>` is computed as `HMAC_SHA256(key, payloadBytes)` where `payloadBytes` are the UTF-8 bytes of the base64url-decoded `<payload>` (which is RFC 8785 JCS canonical JSON).
|
|
747
|
+
- No additional separators, prefixes, or surrounding token strings are included in the HMAC input.
|
|
748
|
+
- **Keyring active set (locked)**:
|
|
749
|
+
- exactly two keys are permitted: `current` and optional `previous`
|
|
750
|
+
- verification order is deterministic: try `current`, then `previous`
|
|
751
|
+
- **Rotation (locked)**:
|
|
752
|
+
- rotation is explicit (not time-based)
|
|
753
|
+
- on rotation: `current → previous`, generate a fresh `current`
|
|
754
|
+
- tokens signed by `previous` remain valid until the next rotation
|
|
755
|
+
|
|
756
|
+
### Token payload fields (locked)
|
|
757
|
+
`stateToken` payload (all required):
|
|
758
|
+
- `tokenVersion: 1`
|
|
759
|
+
- `tokenKind: "state"`
|
|
760
|
+
- `sessionId`
|
|
761
|
+
- `runId`
|
|
762
|
+
- `nodeId`
|
|
763
|
+
- `workflowHash`
|
|
764
|
+
|
|
765
|
+
`ackToken` payload (all required):
|
|
766
|
+
- `tokenVersion: 1`
|
|
767
|
+
- `tokenKind: "ack"`
|
|
768
|
+
- `sessionId`
|
|
769
|
+
- `runId`
|
|
770
|
+
- `nodeId`
|
|
771
|
+
- `attemptId`
|
|
772
|
+
|
|
773
|
+
`checkpointToken` payload (all required):
|
|
774
|
+
- `tokenVersion: 1`
|
|
775
|
+
- `tokenKind: "checkpoint"`
|
|
776
|
+
- `sessionId`
|
|
777
|
+
- `runId`
|
|
778
|
+
- `nodeId`
|
|
779
|
+
- `attemptId`
|
|
780
|
+
|
|
781
|
+
### Ack idempotency + branching (locked)
|
|
782
|
+
- Idempotency key: `(sessionId, runId, nodeId, attemptId)`.
|
|
783
|
+
- Replaying the same `ackToken` is an idempotent no-op: return the same response; do not double-advance.
|
|
784
|
+
- WorkRail may mint multiple `ackToken`s for the same `(runId, nodeId)` with different `attemptId` values to support intentional forks and safe replay handling.
|
|
785
|
+
|
|
786
|
+
### Checkpoint idempotency (locked)
|
|
787
|
+
- Idempotency key: `(sessionId, runId, nodeId, attemptId)`.
|
|
788
|
+
- Replaying the same `checkpointToken` is an idempotent no-op: do not create duplicate checkpoint nodes/edges/outputs; return the same response deterministically.
|
|
789
|
+
|
|
790
|
+
### Rehydrate/advance/replay separation (locked)
|
|
791
|
+
These are **semantics locks** (not just “implementation suggestions”) because v2 correctness depends on them under rewinds/retries.
|
|
792
|
+
|
|
793
|
+
Locks:
|
|
794
|
+
- **Rehydrate is pure**: `continue_workflow` without `ackToken` MUST NOT produce durable writes (no `append`, no outputs, no observations, no gaps, no nodes/edges).
|
|
795
|
+
- **Advance is append-capable**: `continue_workflow` with `ackToken` is the only correctness path that can append durable truth for the targeted node.
|
|
796
|
+
- **Replay is fact-returning**: replaying the same idempotency key `(sessionId, nodeId, attemptId)` MUST return from durable recorded facts (e.g., `advance_recorded` + referenced nodes/edges/outputs) and MUST NOT “re-run” step selection, validation, or rendering logic.
|
|
797
|
+
- **Fail-closed**: if an idempotency key is presented that should have a recorded outcome but none exists, treat it as `ReasonCode.unexpected:invariant_violation` (never silently fall back to recompute).
|
|
798
|
+
|
|
799
|
+
Implementation lock (TypeScript / structural typing):
|
|
800
|
+
- Append-capable APIs MUST require a non-forgeable **capability witness** (e.g., `WithHealthySessionLock` / `CanAppend`) minted only by the session health + lock gate. This prevents accidental writes by “structural” interface matching and makes illegal calls unrepresentable without deliberate construction.
|
|
801
|
+
- Witness misuse-after-release MUST fail-fast: if a witness is used outside the lexical lifetime of the gate callback that minted it, append-capable APIs MUST reject the call before any durable I/O. (This prevents “stash-and-reuse” of an old witness, including after a subsequent re-lock of the same session.)
|
|
802
|
+
|
|
803
|
+
Optimistic replay without lock (locked clarification):
|
|
804
|
+
- Because the event log is append-only and `dedupeKey` values are immutable once committed, a handler MAY perform an optimistic pre-lock read to check for an existing `dedupeKey`.
|
|
805
|
+
- If the `dedupeKey` is found in committed truth, the handler MUST return from recorded facts without acquiring the session lock (fact-returning replay is a pure read).
|
|
806
|
+
- If the `dedupeKey` is NOT found, the handler MUST acquire the lock and re-check under the lock before writing (double-checked locking pattern).
|
|
807
|
+
- Safety invariant: append-only events + `dedupeKey` immutability → pre-lock dedup hits are always correct; misses are caught by the lock re-check.
|
|
808
|
+
- This applies to: `advance_recorded` replay, `checkpoint` replay, and any future idempotent handler replay.
|
|
809
|
+
|
|
810
|
+
### Token validation errors (errors as data, initial closed set)
|
|
811
|
+
- `TOKEN_INVALID_FORMAT`
|
|
812
|
+
- `TOKEN_UNSUPPORTED_VERSION`
|
|
813
|
+
- `TOKEN_BAD_SIGNATURE`
|
|
814
|
+
- `TOKEN_SCOPE_MISMATCH`
|
|
815
|
+
- `TOKEN_UNKNOWN_NODE`
|
|
816
|
+
- `TOKEN_WORKFLOW_HASH_MISMATCH`
|
|
817
|
+
- `TOKEN_SESSION_LOCKED`
|
|
818
|
+
|
|
819
|
+
---
|
|
820
|
+
|
|
821
|
+
## 1.3) Export/import bundle (resumable) (initial v2 schema, locked)
|
|
822
|
+
Export/import exists to share and resume durable truth across machines without relying on runtime tokens (which are handles only).
|
|
823
|
+
|
|
824
|
+
### Bundle format (locked)
|
|
825
|
+
- Export is a **single JSON bundle** with a versioned envelope.
|
|
826
|
+
- Tokens (`stateToken`, `ackToken`) are not included and are not portable.
|
|
827
|
+
- On import, WorkRail re-mints fresh runtime tokens from stored nodes/snapshots.
|
|
828
|
+
|
|
829
|
+
### Bundle envelope (initial v2 schema)
|
|
830
|
+
Required top-level fields:
|
|
831
|
+
- `bundleSchemaVersion: 1`
|
|
832
|
+
- `bundleId` (stable identifier)
|
|
833
|
+
- `exportedAt` (informational only; never used for ordering)
|
|
834
|
+
- `producer` (informational):
|
|
835
|
+
- `appVersion`
|
|
836
|
+
- `appliedConfigHash?`
|
|
837
|
+
- `integrity` (required; see below)
|
|
838
|
+
- `session` (required; see below)
|
|
839
|
+
|
|
840
|
+
### Session contents (required)
|
|
841
|
+
The bundle MUST include:
|
|
842
|
+
- `sessionId`
|
|
843
|
+
- `events`: ordered list of `SessionEvent` in ascending `eventIndex`
|
|
844
|
+
- `manifest`: ordered list of `SessionManifestRecord` in ascending `manifestIndex`
|
|
845
|
+
- `snapshots`: embedded CAS map keyed by `snapshotRef` containing `ExecutionSnapshotFile` entries
|
|
846
|
+
- `pinnedWorkflows`: embedded map keyed by `workflowHash` containing compiled workflow snapshots required for deterministic resume
|
|
847
|
+
|
|
848
|
+
### Integrity (required)
|
|
849
|
+
The bundle MUST include an integrity manifest that allows import to fail fast on corruption.
|
|
850
|
+
|
|
851
|
+
Initial integrity kind:
|
|
852
|
+
- `sha256_manifest_v1`
|
|
853
|
+
|
|
854
|
+
#### Deterministic hashing rule (locked)
|
|
855
|
+
Each integrity entry’s `sha256` is computed over the UTF-8 bytes of **RFC 8785 (JCS)** canonical JSON serialization of the referenced value (arrays preserve their required deterministic ordering, e.g., `events` by `eventIndex`, `manifest` by `manifestIndex`).
|
|
856
|
+
|
|
857
|
+
Formatting:
|
|
858
|
+
- integrity digests use `sha256:<hex>` string form.
|
|
859
|
+
|
|
860
|
+
Minimum integrity entries (illustrative paths):
|
|
861
|
+
- `session/events`
|
|
862
|
+
- `session/manifest`
|
|
863
|
+
- `session/snapshots/<snapshotRef>`
|
|
864
|
+
- `session/pinnedWorkflows/<workflowHash>`
|
|
865
|
+
|
|
866
|
+
### Import semantics (locked)
|
|
867
|
+
- Import defaults to **import-as-new** on session ID collision (no implicit merges).
|
|
868
|
+
- Import validates integrity and ordering before storing durable truth.
|
|
869
|
+
|
|
870
|
+
### Import failure errors (errors as data, initial closed set)
|
|
871
|
+
- `BUNDLE_INVALID_FORMAT`
|
|
872
|
+
- `BUNDLE_UNSUPPORTED_VERSION`
|
|
873
|
+
- `BUNDLE_INTEGRITY_FAILED`
|
|
874
|
+
- `BUNDLE_MISSING_SNAPSHOT`
|
|
875
|
+
- `BUNDLE_MISSING_PINNED_WORKFLOW`
|
|
876
|
+
- `BUNDLE_EVENT_ORDER_INVALID`
|
|
877
|
+
- `BUNDLE_MANIFEST_ORDER_INVALID`
|
|
878
|
+
|
|
879
|
+
Example (skeleton):
|
|
880
|
+
|
|
881
|
+
```json
|
|
882
|
+
{
|
|
883
|
+
"bundleSchemaVersion": 1,
|
|
884
|
+
"bundleId": "bundle_01JH...",
|
|
885
|
+
"exportedAt": "2025-12-19T18:01:02.123Z",
|
|
886
|
+
"producer": { "appVersion": "x.y.z", "appliedConfigHash": "sha256:..." },
|
|
887
|
+
"integrity": {
|
|
888
|
+
"kind": "sha256_manifest_v1",
|
|
889
|
+
"entries": [
|
|
890
|
+
{ "path": "session/events", "sha256": "sha256:...", "bytes": 12345 }
|
|
891
|
+
]
|
|
892
|
+
},
|
|
893
|
+
"session": {
|
|
894
|
+
"sessionId": "sess_01JH...",
|
|
895
|
+
"events": [],
|
|
896
|
+
"manifest": [],
|
|
897
|
+
"snapshots": {},
|
|
898
|
+
"pinnedWorkflows": {}
|
|
899
|
+
}
|
|
900
|
+
}
|
|
901
|
+
```
|
|
902
|
+
|
|
903
|
+
|
|
904
|
+
#### Snapshot identity + provenance (locked)
|
|
905
|
+
- **`SnapshotRef` is content-addressed** (e.g., `sha256:<digest>`).
|
|
906
|
+
- Each referencing event also records **`createdByEventId`** for provenance/debugging (content hash for integrity/dedupe; event linkage for explainability).
|
|
907
|
+
|
|
908
|
+
#### Snapshot storage layout (locked)
|
|
909
|
+
- Use a **global content-addressed snapshot store (CAS)** keyed by `SnapshotRef`.
|
|
910
|
+
- Each session maintains an **append-only pin list** of referenced `SnapshotRef`s (for export/import and GC safety) rather than copying snapshot files per session.
|
|
911
|
+
|
|
912
|
+
#### Snapshot payload scope (locked)
|
|
913
|
+
- Snapshot payloads are **rehydration-only** (no cached projections like recap text; those remain events/projections).
|
|
914
|
+
- This keeps snapshots from becoming a parallel event log and reduces drift risk.
|
|
915
|
+
|
|
916
|
+
---
|
|
917
|
+
|
|
918
|
+
## 1.1) Runs are DAGs; branches are projections (locked)
|
|
919
|
+
- A run’s lineage is a **DAG of nodes** connected by edges.
|
|
920
|
+
- “Branch” is a **projection concept derived from edges/leaves**, not an additional authoritative identifier (`branchId` is not part of durable truth).
|
|
921
|
+
|
|
922
|
+
---
|
|
923
|
+
|
|
924
|
+
## 2) Preferred tip policy (deterministic)
|
|
925
|
+
|
|
926
|
+
Preferred tip is defined **per run** (not per session).
|
|
927
|
+
|
|
928
|
+
Selection:
|
|
929
|
+
1) identify leaf nodes (no children)
|
|
930
|
+
2) compute each leaf’s last-activity as max `EventIndex` among events that touch the node’s reachable history (node/output/gap/capability/prefs/divergence/edge)
|
|
931
|
+
3) choose highest last-activity
|
|
932
|
+
4) tie-breakers: `node_created` index, then lexical `NodeId`
|
|
933
|
+
|
|
934
|
+
Never use wall-clock timestamps for tie-breaking.
|
|
935
|
+
|
|
936
|
+
---
|
|
937
|
+
|
|
938
|
+
## 3) Gaps + user-only dependencies (closed sets + mode behavior)
|
|
939
|
+
|
|
940
|
+
### User-only dependencies: closed reasons
|
|
941
|
+
`UserOnlyDependencyReason` (initial closed set):
|
|
942
|
+
- `needs_user_secret_or_token`
|
|
943
|
+
- `needs_user_account_access`
|
|
944
|
+
- `needs_user_artifact`
|
|
945
|
+
- `needs_user_choice`
|
|
946
|
+
- `needs_user_approval`
|
|
947
|
+
- `needs_user_environment_action`
|
|
948
|
+
|
|
949
|
+
Special rule:
|
|
950
|
+
- `needs_user_choice` is only emitted when the workflow explicitly marks the choice as **non-assumable** using `NonAssumableChoiceKind`.
|
|
951
|
+
|
|
952
|
+
`NonAssumableChoiceKind` (closed set):
|
|
953
|
+
- `preference_tradeoff`
|
|
954
|
+
- `scope_boundary`
|
|
955
|
+
- `irreversible_action`
|
|
956
|
+
- `external_side_effect`
|
|
957
|
+
- `policy_or_compliance`
|
|
958
|
+
|
|
959
|
+
### Gaps: the never-stop disclosure primitive
|
|
960
|
+
Gaps are append-only durable disclosures. They are never mutated; “resolution” is represented by append-only linkage (e.g., `resolvesGapId`).
|
|
961
|
+
|
|
962
|
+
Projection rule (recommended):
|
|
963
|
+
- a gap is considered “resolved” if any later gap record references it via `resolvesGapId` (or equivalent linkage)
|
|
964
|
+
- history remains visible; “current state” is a projection
|
|
965
|
+
|
|
966
|
+
### Mode behavior (core)
|
|
967
|
+
- In `guided` and `full_auto_stop_on_user_deps`: user-only dependencies can return `blocked`.
|
|
968
|
+
- In `full_auto_never_stop`: never `blocked`; record critical gaps and proceed with explicit durable disclosure.
|
|
969
|
+
|
|
970
|
+
### Unified reason model (blocked ↔ gaps) (initial v2, locked)
|
|
971
|
+
To prevent semantic drift between “blocked” (UX/control-flow) and “gaps” (durable disclosure), v2 locks a single underlying closed-set reason model.
|
|
972
|
+
|
|
973
|
+
Locks:
|
|
974
|
+
- Define a single closed-set `ReasonCode` used as the semantic source of truth for both:
|
|
975
|
+
- `BlockerReport` (returned when the run is blocked in blocking modes)
|
|
976
|
+
- `GapReason` (recorded durably in never-stop and for auditability)
|
|
977
|
+
- Blocking vs never-stop changes **control flow**, not meaning:
|
|
978
|
+
- In blocking modes, a `ReasonCode` may produce `blocked` plus durable accounting.
|
|
979
|
+
- In never-stop, the same `ReasonCode` MUST produce a `gap_recorded` (severity per mapping) and execution continues.
|
|
980
|
+
- Mapping is deterministic and table-driven (no ad-hoc conversions).
|
|
981
|
+
|
|
982
|
+
`ReasonCode` (closed set, initial):
|
|
983
|
+
- `user_only_dependency:<UserOnlyDependencyReason>`
|
|
984
|
+
- `contract_violation:missing_required_output`
|
|
985
|
+
- `contract_violation:invalid_required_output`
|
|
986
|
+
- `capability_missing:required_capability_unknown`
|
|
987
|
+
- `capability_missing:required_capability_unavailable`
|
|
988
|
+
- `unexpected:invariant_violation`
|
|
989
|
+
- `unexpected:storage_corruption_detected`
|
|
990
|
+
|
|
991
|
+
Deterministic mapping (locked intent):
|
|
992
|
+
- `ReasonCode.user_only_dependency:*`:
|
|
993
|
+
- blocking modes → `blocked`
|
|
994
|
+
- never-stop → `gap_recorded(severity=critical, category=user_only_dependency, detail=<reason>)`
|
|
995
|
+
- `ReasonCode.contract_violation:*`:
|
|
996
|
+
- blocking modes → `blocked`
|
|
997
|
+
- never-stop → `gap_recorded(severity=critical, category=contract_violation, detail=<reason>)`
|
|
998
|
+
- `ReasonCode.capability_missing:*`:
|
|
999
|
+
- blocking modes → `blocked` iff the capability is required by the compiled workflow
|
|
1000
|
+
- never-stop → `gap_recorded(severity=critical, category=capability_missing, detail=<reason>)`
|
|
1001
|
+
- `ReasonCode.unexpected:*`:
|
|
1002
|
+
- always → `gap_recorded(severity=critical, category=unexpected, detail=<reason>)`
|
|
1003
|
+
- and the protocol path must fail fast where correctness would be compromised (e.g., corruption on advancement).
|
|
1004
|
+
|
|
1005
|
+
---
|
|
1006
|
+
|
|
1007
|
+
## 4) Preferences + modes (minimal closed set)
|
|
1008
|
+
|
|
1009
|
+
### Preferences (v2 minimal)
|
|
1010
|
+
- `autonomy`: `guided | full_auto_stop_on_user_deps | full_auto_never_stop`
|
|
1011
|
+
- `riskPolicy`: `conservative | balanced | aggressive`
|
|
1012
|
+
|
|
1013
|
+
`riskPolicy` guardrails:
|
|
1014
|
+
- allowed: warning thresholds + default selection between correct paths
|
|
1015
|
+
- disallowed: bypassing contracts/capabilities, changing fork/token semantics, suppressing disclosure, redefining user-only deps
|
|
1016
|
+
|
|
1017
|
+
### Invariants (not preferences)
|
|
1018
|
+
Disclosure is mandatory: assumptions/skips/missing required data must be recorded durably (via outputs and/or gaps).
|
|
1019
|
+
|
|
1020
|
+
### Durability + precedence
|
|
1021
|
+
Effective preference snapshots are node-attached (rewind-safe, export/import safe).
|
|
1022
|
+
Precedence: node-attached → session baseline → global defaults.
|
|
1023
|
+
|
|
1024
|
+
Mode presets (recommended v2 baseline):
|
|
1025
|
+
- Guided: `autonomy=guided`, `riskPolicy=conservative`
|
|
1026
|
+
- Full-auto (stop on user deps): `autonomy=full_auto_stop_on_user_deps`, `riskPolicy=balanced`
|
|
1027
|
+
- Full-auto (never stop): `autonomy=full_auto_never_stop`, `riskPolicy=conservative`
|
|
1028
|
+
|
|
1029
|
+
---
|
|
1030
|
+
|
|
1031
|
+
## 5) Workflow recommendations + warnings (pinned, no hard blocks)
|
|
1032
|
+
|
|
1033
|
+
Recommendations are part of the compiled workflow snapshot (included in `workflowHash`).
|
|
1034
|
+
|
|
1035
|
+
### Compiled workflow snapshot + `workflowHash` canonicalization (initial v2, locked)
|
|
1036
|
+
`workflowHash` is computed from the fully expanded **compiled** workflow snapshot, not raw source JSON. This is the determinism anchor for runs, export/import, and “pinned vs source drift” explainability.
|
|
1037
|
+
|
|
1038
|
+
Locks:
|
|
1039
|
+
- WorkRail persists compiled workflow snapshots keyed by `workflowHash` as durable truth (required for long-lived runs and resumable import/export).
|
|
1040
|
+
- `workflowHash = sha256(JCS(compiledSnapshotV1))` where `JCS` is RFC 8785 JSON Canonicalization Scheme.
|
|
1041
|
+
- The compiled snapshot is versioned; new versions require explicit migration logic (do not silently reinterpret old snapshots).
|
|
1042
|
+
|
|
1043
|
+
#### `CompiledWorkflowSnapshotV1` (locked, high-level shape)
|
|
1044
|
+
The compiled snapshot MUST contain enough information to:
|
|
1045
|
+
- render the exact `pending.prompt` text deterministically
|
|
1046
|
+
- validate required outputs (contract packs resolved to schemas)
|
|
1047
|
+
- execute capability probing/fallback paths deterministically
|
|
1048
|
+
- explain provenance (what was authored vs injected)
|
|
1049
|
+
|
|
1050
|
+
#### Function definitions in rewind/resumption context (locked)
|
|
1051
|
+
Some workflows use `functionDefinitions` + `functionReferences` to reduce repeated instructions (define once, reference many times). Under chat rewinds and brand-new chat resumption, the agent may lose “what does `foo()` mean?” context unless WorkRail rehydrates it deterministically.
|
|
1052
|
+
|
|
1053
|
+
Locks:
|
|
1054
|
+
- Any function definitions and reference wiring that affect the agent-visible instructions MUST be included in the pinned compiled workflow snapshot (and therefore in `workflowHash`). No reliance on transcript memory or external files is permitted for correctness.
|
|
1055
|
+
- `continue_workflow` **rehydrate-only** responses MUST include the relevant function definitions as part of the bounded recovery context by expanding them into the rendered `pending.prompt` text (preferred) rather than introducing separate agent-facing metadata fields.
|
|
1056
|
+
- Inclusion is deterministic and byte-budgeted:
|
|
1057
|
+
- **Tip node (resume)**: include all function definitions referenced by the pending step instance (including any workflow/loop/step scoped definitions visible to that step).
|
|
1058
|
+
- **Non-tip node (rewind/fork)**: include pending step functions **plus** any function definitions referenced by the branch-focused recovery context returned (preferred tip downstream recap + any included child-branch summaries).
|
|
1059
|
+
- **Priority**: pending-step referenced functions first, then downstream recap referenced functions, then other branch summaries.
|
|
1060
|
+
- **Ordering**: deterministic by `(scope precedence: step → loop → workflow, functionName lex)`.
|
|
1061
|
+
- **Truncation**: if function definitions would exceed the response budget, deterministically truncate by bytes (UTF-8) and append the canonical truncation marker `\n\n[TRUNCATED]`, plus a short deterministic omission note (e.g., “Omitted N function definitions”).
|
|
1062
|
+
|
|
1063
|
+
Conceptual fields (exact schema is code-canonical and generated):
|
|
1064
|
+
- `schemaVersion`
|
|
1065
|
+
- `workflowId`, `name?`, `description?` (identity/explainability; included in the hash to avoid “same content, different identity” ambiguity)
|
|
1066
|
+
- `agentRole?` (post-merge effective role stance text)
|
|
1067
|
+
- `capabilities` (desired requirements: required/preferred/disabled)
|
|
1068
|
+
- `features` (resolved + ordered; includes typed configs)
|
|
1069
|
+
- `contracts` (resolved contract pack definitions/schemas used by steps/templates)
|
|
1070
|
+
- `steps[]` (fully expanded step list):
|
|
1071
|
+
- `stepId`, `title`, `requireConfirmation`
|
|
1072
|
+
- resolved `promptBlocks` and rendered `pending.prompt` text
|
|
1073
|
+
- `output.contractRef?` resolved to a contract schema reference
|
|
1074
|
+
- provenance: `{ source: authored|template_injected|feature_injected, originId? }`
|
|
1075
|
+
- `conditions[]` (resolved closed-set conditions referenced by loops/control structures)
|
|
1076
|
+
- `loops[]` (resolved loop definitions with stable body ordering / indices)
|
|
1077
|
+
- `compiledWarnings[]` (closed set; no timestamps)
|
|
1078
|
+
|
|
1079
|
+
#### Deterministic compilation ordering rules (locked)
|
|
1080
|
+
To keep `workflowHash` stable and avoid “order by accident” drift, compilation MUST normalize ordering as follows:
|
|
1081
|
+
|
|
1082
|
+
- Feature application order:
|
|
1083
|
+
- Resolve `features[]` to a deduped list keyed by `featureId`.
|
|
1084
|
+
- Apply features in ascending lexical `featureId` order (config is part of the hash; list order in source does not affect determinism).
|
|
1085
|
+
|
|
1086
|
+
- Template expansion order:
|
|
1087
|
+
- Expand `template_call` steps **in-place** (replace the call with the template’s expanded step list).
|
|
1088
|
+
- Template-expanded steps preserve the template-defined order.
|
|
1089
|
+
- If multiple templates are injected at the same anchor, order by `(originFeatureId, templateId, injectionIndex)` where:
|
|
1090
|
+
- `originFeatureId` is lexical (or empty for authored template calls)
|
|
1091
|
+
- `injectionIndex` is a stable ordinal assigned during compilation (no timestamps).
|
|
1092
|
+
|
|
1093
|
+
- Step list order (`steps[]` in the compiled snapshot):
|
|
1094
|
+
- Preserve authored `steps[]` order.
|
|
1095
|
+
- For each `template_call`, substitute its expansion in place (no reordering across siblings).
|
|
1096
|
+
- Feature-injected steps are inserted at deterministic anchor points with stable ordering by `(originFeatureId, originId, insertionIndex)`.
|
|
1097
|
+
|
|
1098
|
+
- Conditions:
|
|
1099
|
+
- Resolve to a closed-set typed list and sort by `conditionId` lex.
|
|
1100
|
+
|
|
1101
|
+
- Loops:
|
|
1102
|
+
- Loops are resolved from authored `type:"loop"` steps.
|
|
1103
|
+
- `loops[]` in the compiled snapshot is sorted by `loopId` lex.
|
|
1104
|
+
- Each loop’s `body[]` ordering is authoritative (this ordering defines `bodyIndex`).
|
|
1105
|
+
|
|
1106
|
+
- Contracts:
|
|
1107
|
+
- Resolve contract packs referenced by steps/templates/features into a deduped list keyed by `contractRef`.
|
|
1108
|
+
- Sort resolved contracts by `contractRef` lex.
|
|
1109
|
+
|
|
1110
|
+
#### What is explicitly excluded from `workflowHash` (locked)
|
|
1111
|
+
- runtime tokens (`stateToken`, `ackToken`)
|
|
1112
|
+
- session/run/node identifiers
|
|
1113
|
+
- any timestamps
|
|
1114
|
+
- any environment observations (git branch/SHA, workspace paths)
|
|
1115
|
+
|
|
1116
|
+
#### Embedded schema canonicalization (locked)
|
|
1117
|
+
Compiled workflow snapshots may embed contract schemas (and other structured definitions) that are used for validation and Studio inspection.
|
|
1118
|
+
|
|
1119
|
+
Locks:
|
|
1120
|
+
- Embedded schemas MUST be represented as **typed canonical data** within `CompiledWorkflowSnapshotV1` (not as raw JSON strings/blobs).
|
|
1121
|
+
- Canonicalization and hashing MUST be performed over the single JCS serialization of the compiled snapshot (no secondary ad-hoc stringification).
|
|
1122
|
+
- Any ordering within embedded schema structures that is semantically irrelevant MUST be normalized deterministically during compilation (e.g., sort object keys by JCS; sort lists where order is not semantically meaningful).
|
|
1123
|
+
|
|
1124
|
+
Closed-set recommendation targets:
|
|
1125
|
+
- `recommendedAutonomy` (same closed set as `autonomy`)
|
|
1126
|
+
- `recommendedRiskPolicy` (same closed set as `riskPolicy`)
|
|
1127
|
+
|
|
1128
|
+
Warnings:
|
|
1129
|
+
- emitted when effective preferences exceed recommendation (by closed partial orders):
|
|
1130
|
+
- `guided` < `full_auto_stop_on_user_deps` < `full_auto_never_stop`
|
|
1131
|
+
- `conservative` < `balanced` < `aggressive`
|
|
1132
|
+
- structured + text-first
|
|
1133
|
+
- recorded durably on the node (event or artifact)
|
|
1134
|
+
- never hard-block user choice
|
|
1135
|
+
|
|
1136
|
+
---
|
|
1137
|
+
|
|
1138
|
+
## Appendix: capability observation provenance (guardrail)
|
|
1139
|
+
|
|
1140
|
+
Capability observations must be durable and self-correcting. To prevent “agent said so” from becoming enforcement-grade truth:
|
|
1141
|
+
- Capability observations must include a closed-set provenance.
|
|
1142
|
+
- Only “strong” provenance (e.g., a WorkRail-injected probe step) is treated as enforcement-grade.
|
|
1143
|
+
- “Weak” provenance (manual claim) may inform UX but must not unlock required capability paths.
|
|
1144
|
+
|
|
1145
|
+
---
|
|
1146
|
+
|
|
1147
|
+
## 6) Console architecture locks (control plane, not execution plane)
|
|
1148
|
+
|
|
1149
|
+
The Console is a WorkRail control plane and observability UI. It must not become an alternate execution truth.
|
|
1150
|
+
|
|
1151
|
+
### Projections API is internal-only (locked)
|
|
1152
|
+
Read-only projections (session/run/node summaries) MUST be implemented as an internal, code-canonical module used by:
|
|
1153
|
+
- Studio/Console UI
|
|
1154
|
+
- CLI commands and exports
|
|
1155
|
+
|
|
1156
|
+
Lock:
|
|
1157
|
+
- Do **not** add MCP tools for projections. The agent-facing MCP surface remains minimal; `resume_session` + export/import cover agent needs without expanding tool discovery.
|
|
1158
|
+
|
|
1159
|
+
Projection invariants:
|
|
1160
|
+
- deterministic given durable truth
|
|
1161
|
+
- bounded payloads (budgeted truncation with canonical marker)
|
|
1162
|
+
- salvage-aware (clearly labeled; never used for execution advancement)
|
|
1163
|
+
|
|
1164
|
+
### Desired vs applied (restart-first UX)
|
|
1165
|
+
|
|
1166
|
+
Because tool discovery is bounded at initialization, the Console must model config as:
|
|
1167
|
+
- **desired**: what the user wants
|
|
1168
|
+
- **applied**: what is currently active in the running MCP server
|
|
1169
|
+
|
|
1170
|
+
Lock:
|
|
1171
|
+
- On server start, WorkRail computes and records an **`appliedConfigHash`** for the config that is actually in effect.
|
|
1172
|
+
- The Console must always show **desired vs applied** and the **restart requirement** when they differ.
|
|
1173
|
+
|
|
1174
|
+
### Restart-required triggers (closed set)
|
|
1175
|
+
|
|
1176
|
+
A config change is **restart-required** if it changes any of:
|
|
1177
|
+
- the MCP **tool set** (tools added/removed)
|
|
1178
|
+
- any MCP tool **schema** (inputs/outputs)
|
|
1179
|
+
- workflow source registration that impacts discovery/catalog (adding/removing sources, enabling/disabling sources)
|
|
1180
|
+
- feature flags that gate tools or tool schemas
|
|
1181
|
+
|
|
1182
|
+
A config change is **runtime-safe** if it only changes:
|
|
1183
|
+
- read-only presentation settings (UI-only)
|
|
1184
|
+
- data retention settings for projections (must not affect correctness of existing run graphs)
|
|
1185
|
+
|
|
1186
|
+
### Workflow editing (edit source only)
|
|
1187
|
+
|
|
1188
|
+
Lock:
|
|
1189
|
+
- The Console edits **source workflows**, never compiled snapshots.
|
|
1190
|
+
- Compiled workflows are derived artifacts used for pinning (`workflowHash`) and must not be user-editable.
|
|
1191
|
+
|
|
1192
|
+
### Bundled namespace protections
|
|
1193
|
+
|
|
1194
|
+
Lock:
|
|
1195
|
+
- `wr.*` is reserved for bundled/core workflows and is **read-only**.
|
|
1196
|
+
- Console must provide **fork/copy-to-editable-namespace** for any changes, rather than allowing overrides/shadowing of `wr.*`.
|
|
1197
|
+
|
|
1198
|
+
### Source vs compiled inspection + pinned drift warnings
|
|
1199
|
+
|
|
1200
|
+
Lock:
|
|
1201
|
+
- Console must support inspecting:
|
|
1202
|
+
- the **source** workflow
|
|
1203
|
+
- the **compiled** workflow snapshot
|
|
1204
|
+
- the **pinned** snapshot for a given run (`workflowHash`)
|
|
1205
|
+
- If the on-disk source differs from the pinned snapshot for a run, Console must surface a **pinned drift warning** as structured data (for explainability).
|
|
1206
|
+
|
|
1207
|
+
---
|
|
1208
|
+
|
|
1209
|
+
## 7) Workflow ID namespaces + migration locks
|
|
1210
|
+
|
|
1211
|
+
### Namespaced ID format (normative)
|
|
1212
|
+
|
|
1213
|
+
Lock:
|
|
1214
|
+
- Workflow IDs use `namespace.name` with **exactly one dot**.
|
|
1215
|
+
- Allowed pattern per segment: `[a-z][a-z0-9_-]*`.
|
|
1216
|
+
- Reserved namespace: **`wr.*`** is reserved exclusively for bundled/core workflows.
|
|
1217
|
+
|
|
1218
|
+
### Enforcement rules (normative)
|
|
1219
|
+
|
|
1220
|
+
Lock:
|
|
1221
|
+
- Any non-core source attempting to define a workflow whose ID starts with `wr.` is **rejected at load/validate time** with an actionable error.
|
|
1222
|
+
- **No shadowing**: bundled/core (`wr.*`) workflows cannot be overridden by priority order or source precedence.
|
|
1223
|
+
|
|
1224
|
+
### Legacy IDs (no dot)
|
|
1225
|
+
|
|
1226
|
+
Lock:
|
|
1227
|
+
- Legacy IDs remain runnable for backward compatibility.
|
|
1228
|
+
- Creating/saving new workflows with legacy IDs is **rejected** (authoring-time enforcement).
|
|
1229
|
+
- Loading existing legacy workflows is **warn-only** (do not break existing installs).
|
|
1230
|
+
|
|
1231
|
+
### Deterministic rename suggestions
|
|
1232
|
+
|
|
1233
|
+
Lock:
|
|
1234
|
+
- Rename suggestions are deterministic and based on workflow **source**, not user choice:
|
|
1235
|
+
- user dir → `user.<name>`
|
|
1236
|
+
- project dir → `project.<name>`
|
|
1237
|
+
- git/remote/plugin → `repo.<name>` (or `team.<name>` only when explicitly configured)
|
|
1238
|
+
- The `<name>` segment is the legacy ID normalized (lowercase; hyphens → underscores).
|
|
1239
|
+
- If the suggested ID collides, append a short deterministic suffix (e.g., `_<sourceHash4>`). Never use timestamps.
|
|
1240
|
+
|
|
1241
|
+
### Bundled ID rename timing + aliasing
|
|
1242
|
+
|
|
1243
|
+
Lock:
|
|
1244
|
+
- Bundled workflows should be renamed to `wr.*` **before** v2 pinning is widely created.
|
|
1245
|
+
- Keep read-only aliases (legacy bundled id → canonical `wr.*`) for backward compatibility, emitting structured warnings.
|
|
1246
|
+
|
|
1247
|
+
### Relationship to pinning
|
|
1248
|
+
|
|
1249
|
+
Lock:
|
|
1250
|
+
- `workflowId` is part of the compiled workflow snapshot that is hashed into `workflowHash`. This avoids “same content, different identity” ambiguity and keeps export/import and Console inspection explainable.
|
|
1251
|
+
|
|
1252
|
+
### Discovery output to support migration UX
|
|
1253
|
+
|
|
1254
|
+
Lock:
|
|
1255
|
+
- Discovery returns explicit migration fields:
|
|
1256
|
+
- `idStatus: namespaced | legacy`
|
|
1257
|
+
- `canonicalId?` (when an alias is used)
|
|
1258
|
+
- `suggestedId?` (deterministic)
|
|
1259
|
+
- `sourceKind` (closed set: bundled | user | project | remote | plugin)
|
|
1260
|
+
|
|
1261
|
+
---
|
|
1262
|
+
|
|
1263
|
+
## 8) Tools vs docs alignment locks (drift prevention)
|
|
1264
|
+
|
|
1265
|
+
v1 suffered from schema/description/documentation drift. v2 must treat this as a first-class failure mode.
|
|
1266
|
+
|
|
1267
|
+
### Single canonical source of truth
|
|
1268
|
+
|
|
1269
|
+
Lock:
|
|
1270
|
+
- MCP tool **schemas** and **descriptions** must be generated from the same canonical source (code), not maintained in parallel.
|
|
1271
|
+
- Any docs that restate tool schemas are **derived artifacts** and must not be hand-edited.
|
|
1272
|
+
|
|
1273
|
+
### Generation + verification
|
|
1274
|
+
|
|
1275
|
+
Lock:
|
|
1276
|
+
- Provide a deterministic generator that produces:
|
|
1277
|
+
- tool catalog (names, titles, schemas)
|
|
1278
|
+
- mode-specific descriptions (all supported modes)
|
|
1279
|
+
- any human-facing “tool reference” docs
|
|
1280
|
+
- CI (or precommit) must fail if generated outputs are out of date relative to the canonical source.
|
|
1281
|
+
|
|
1282
|
+
### Editing rule
|
|
1283
|
+
|
|
1284
|
+
Lock:
|
|
1285
|
+
- If a schema or description needs to change, the change is made **only** in the canonical definitions; regenerated outputs follow.
|
|
1286
|
+
- This prevents “fix docs but forget schema” and “fix schema but forget docs” classes of bugs.
|
|
1287
|
+
|
|
1288
|
+
### Generation + verification pipeline (locked)
|
|
1289
|
+
To prevent drift, WorkRail v2 treats TypeScript domain types + Zod schemas as the **single canonical source** for:
|
|
1290
|
+
- MCP tool schemas and descriptions
|
|
1291
|
+
- builtin registries (templates/features/contract packs/capabilities)
|
|
1292
|
+
- durable store schemas (event log, manifest, snapshots, export bundle)
|
|
1293
|
+
|
|
1294
|
+
Generated outputs (derived artifacts):
|
|
1295
|
+
- JSON Schemas for tool I/O and durable store schemas
|
|
1296
|
+
- Studio-ready builtins registry metadata
|
|
1297
|
+
- optional: a single generated “schema reference” doc that links to generated JSON schemas (never hand-edited)
|
|
1298
|
+
|
|
1299
|
+
Verification (locked intent):
|
|
1300
|
+
- Provide a deterministic generator and a verifier.
|
|
1301
|
+
- The verifier regenerates into a temp location, diffs against the committed generated artifacts, and fails fast with an actionable error if out of date.
|
|
1302
|
+
- Determinism requirements:
|
|
1303
|
+
- stable ordering (sorted keys, stable arrays)
|
|
1304
|
+
- no timestamps in generated content
|
|
1305
|
+
- stable formatting
|
|
1306
|
+
|
|
1307
|
+
---
|
|
1308
|
+
|
|
1309
|
+
## 11) Canonical JSON + hashing standard (initial v2, locked)
|
|
1310
|
+
To prevent cross-transport drift, all hashing in v2 (workflow pinning, bundle integrity, etc.) MUST use a single canonical JSON standard.
|
|
1311
|
+
|
|
1312
|
+
Lock:
|
|
1313
|
+
- Use **RFC 8785 (JSON Canonicalization Scheme, JCS)** for canonical JSON serialization.
|
|
1314
|
+
- Hash algorithm is SHA-256; digest strings use `sha256:<hex>`.
|
|
1315
|
+
|
|
1316
|
+
Applies to:
|
|
1317
|
+
- `workflowHash` computation (compiled workflow snapshot)
|
|
1318
|
+
- export/import bundle integrity entries
|
|
1319
|
+
- any future content-addressed references stored as `sha256:*`
|
|
1320
|
+
|
|
1321
|
+
---
|
|
1322
|
+
|
|
1323
|
+
## 12) Unified error envelope (initial v2, locked)
|
|
1324
|
+
To prevent cross-tool drift (MCP vs CLI vs Studio), all surfaced errors MUST use a single envelope shape and closed-set codes per domain.
|
|
1325
|
+
|
|
1326
|
+
Envelope shape (conceptual):
|
|
1327
|
+
- `code` (closed set)
|
|
1328
|
+
- `message` (human-readable, concise)
|
|
1329
|
+
- `retry` (closed set; drives deterministic client behavior without parsing prose):
|
|
1330
|
+
- `{ kind: "not_retryable" }`
|
|
1331
|
+
- `{ kind: "retryable_immediate" }`
|
|
1332
|
+
- `{ kind: "retryable_after_ms", afterMs: number }`
|
|
1333
|
+
- `details?` (bounded, structured; never required for correctness)
|
|
1334
|
+
|
|
1335
|
+
Lock:
|
|
1336
|
+
- Never throw errors across MCP boundaries; map to structured error envelopes.
|
|
1337
|
+
- Retry guidance MUST be conveyed via `retry` (not `message` or other free-form strings-as-data).
|
|
1338
|
+
|
|
1339
|
+
### Agent-first, self-correcting error messages (locked)
|
|
1340
|
+
v2 must optimize for an honest-but-buggy agent caller. Errors must be actionable without requiring the agent to guess or reverse-engineer schemas.
|
|
1341
|
+
|
|
1342
|
+
Locks:
|
|
1343
|
+
- Errors MUST be **specific**: state what is wrong, where it applies (which tool/input), and why it matters.
|
|
1344
|
+
- Errors MUST be **self-correcting**: include a `suggestion` that tells the agent exactly what to do next.
|
|
1345
|
+
- Errors SHOULD include structured `details` that are JSON-safe and deterministic (no file paths, no timestamps).
|
|
1346
|
+
- For input-size/budget violations (e.g., `context` budget): the error MUST include the measured size, the max size, and the measurement method (e.g., JCS UTF-8 bytes), plus concrete reduction guidance ("remove blobs; pass references").
|
|
1347
|
+
|
|
1348
|
+
### Error code domains + boundary rule (locked intent)
|
|
1349
|
+
To keep errors type-safe and prevent “guess which layer failed” behavior:
|
|
1350
|
+
- Token-driven MCP execution tools (`start_workflow`, `continue_workflow`, `checkpoint_workflow`) MUST return only `TOKEN_*` codes for token/session locking and token validation failures.
|
|
1351
|
+
- Storage/projection/Console/CLI operations MUST return only non-token domains (e.g., `SESSION_*`, `STORE_*`, `BUNDLE_*`) for non-token failures.
|
|
1352
|
+
|
|
1353
|
+
---
|
|
1354
|
+
|
|
1355
|
+
## 13) Local data directory layout (initial v2, locked)
|
|
1356
|
+
To avoid path drift and scattered state, WorkRail persists all durable truth and derived caches under a single WorkRail-owned local data directory.
|
|
1357
|
+
|
|
1358
|
+
Locks:
|
|
1359
|
+
- The data directory is **WorkRail-owned** (not inside workflow source directories).
|
|
1360
|
+
- All paths stored in manifests/bundles are **relative** to the session root or bundle root (no absolute paths).
|
|
1361
|
+
|
|
1362
|
+
Implementation note:
|
|
1363
|
+
- The root may be configurable for dev/testing (e.g., via an env var like `WORKRAIL_DATA_DIR`), but **relative-path-only** storage remains a hard invariant.
|
|
1364
|
+
|
|
1365
|
+
Conceptual layout (authoritative intent; exact root resolution is platform-specific):
|
|
1366
|
+
- `data/`
|
|
1367
|
+
- `sessions/<sessionId>/`
|
|
1368
|
+
- `events/` (JSONL segments)
|
|
1369
|
+
- `manifest.jsonl` (segment attestation + snapshot pins)
|
|
1370
|
+
- `cache/` (derived, rebuildable projections; safe to delete)
|
|
1371
|
+
- `snapshots/` (global CAS, keyed by `snapshotRef`)
|
|
1372
|
+
- `workflows/`
|
|
1373
|
+
- `pinned/` (compiled workflow snapshots keyed by `workflowHash`)
|
|
1374
|
+
- `keys/`
|
|
1375
|
+
- `keyring.json` (current + previous signing keys)
|
|
1376
|
+
|
|
1377
|
+
---
|
|
1378
|
+
|
|
1379
|
+
## 14) Schema versioning policy (initial v2, locked)
|
|
1380
|
+
Schema versions exist to preserve determinism and avoid silent reinterpretation of durable truth.
|
|
1381
|
+
|
|
1382
|
+
Locks:
|
|
1383
|
+
- Every durable artifact type is versioned:
|
|
1384
|
+
- session events (`v`)
|
|
1385
|
+
- manifest records (`v`)
|
|
1386
|
+
- snapshots (outer `v` and inner `enginePayload` version)
|
|
1387
|
+
- compiled workflow snapshots (`schemaVersion`)
|
|
1388
|
+
- export/import bundles (`bundleSchemaVersion`)
|
|
1389
|
+
- **Additive-only within a version**: adding optional fields is allowed; changing meaning or required fields is not.
|
|
1390
|
+
- **Breaking changes require a version bump** and explicit migration logic (or fail-fast import with actionable error).
|
|
1391
|
+
- **Unknown versions fail fast** (do not guess).
|
|
1392
|
+
- **Unknown fields are ignored** only when the version is known and the fields are explicitly optional; otherwise fail-fast with a structured error.
|
|
1393
|
+
|
|
1394
|
+
---
|
|
1395
|
+
|
|
1396
|
+
## 15) Single-writer enforcement (initial v2, locked)
|
|
1397
|
+
WorkRail must enforce a single writer per session to keep append-only ordering and idempotency deterministic.
|
|
1398
|
+
|
|
1399
|
+
Locks:
|
|
1400
|
+
- Use an OS-level exclusive file lock on a session-scoped lockfile, e.g.:
|
|
1401
|
+
- `sessions/<sessionId>/.lock`
|
|
1402
|
+
- The lock must be held for the duration of any append sequence that mutates durable truth for that session (event segments and/or `manifest.jsonl`).
|
|
1403
|
+
- If the lock cannot be acquired, fail fast with a retryable structured error:
|
|
1404
|
+
- `TOKEN_SESSION_LOCKED` (for token/advance flows)
|
|
1405
|
+
- `SESSION_LOCKED` (for storage/projection operations that are not token-derived)
|
|
1406
|
+
- Retry guidance is explicit and bounded (e.g., “retry in a few seconds; if this persists, ensure no other WorkRail process is running”).
|
|
1407
|
+
|
|
1408
|
+
---
|
|
1409
|
+
|
|
1410
|
+
## 16) Implementation sequencing (locked)
|
|
1411
|
+
To prevent drift and rework, WorkRail v2 implementation MUST follow a “type-first, contract-frozen” sequence:
|
|
1412
|
+
|
|
1413
|
+
1) **Canonical models + hashing (no I/O)**:
|
|
1414
|
+
- branded ID types + discriminated unions
|
|
1415
|
+
- Zod schemas for all durable artifacts (events/manifest/snapshots/compiled snapshots/bundles)
|
|
1416
|
+
- RFC 8785 (JCS) canonicalization + SHA-256 helpers
|
|
1417
|
+
- generated JSON Schemas + verification (anti-drift)
|
|
1418
|
+
2) **Pure projections** (deterministic, bounded):
|
|
1419
|
+
- preferred tip, run status, current outputs, unresolved gaps, resume ranking
|
|
1420
|
+
3) **Storage substrate** (ports/adapters):
|
|
1421
|
+
- event segments + manifest, CAS snapshots, pinned workflows, locks, recovery/salvage
|
|
1422
|
+
4) **Protocol orchestration**:
|
|
1423
|
+
- token mint/validate, ack attempts, start/continue, export/import
|
|
1424
|
+
5) **Determinism suite**:
|
|
1425
|
+
- golden hash fixtures, replay/idempotency tests, export/import roundtrip tests
|
|
1426
|
+
|
|
1427
|
+
## 16.1) Implementation blueprint (where to look) (locked intent)
|
|
1428
|
+
When implementing WorkRail v2, treat the following documents as the authoritative “blueprint” set. This is intentionally a short list to prevent drift.
|
|
1429
|
+
|
|
1430
|
+
1) **Primary authority (locks):**
|
|
1431
|
+
- `docs/design/v2-core-design-locks.md` (this document)
|
|
1432
|
+
|
|
1433
|
+
2) **Normative MCP boundary contract:**
|
|
1434
|
+
- `docs/reference/workflow-execution-contract.md`
|
|
1435
|
+
|
|
1436
|
+
3) **Hard platform constraints:**
|
|
1437
|
+
- `docs/reference/mcp-platform-constraints.md`
|
|
1438
|
+
|
|
1439
|
+
4) **Accepted decision records (rationale for core choices):**
|
|
1440
|
+
- `docs/adrs/005-agent-first-workflow-execution-tokens.md`
|
|
1441
|
+
- `docs/adrs/006-append-only-session-run-event-log.md`
|
|
1442
|
+
- `docs/adrs/007-resume-and-checkpoint-only-sessions.md`
|
|
1443
|
+
|
|
1444
|
+
5) **Authoring/compilation shape (compiler must match):**
|
|
1445
|
+
- `docs/design/workflow-authoring-v2.md`
|
|
1446
|
+
|
|
1447
|
+
6) **Console/Studio UX constraints (UI must not become truth):**
|
|
1448
|
+
- `docs/design/studio.md`
|
|
1449
|
+
|
|
1450
|
+
7) **High-level summary (non-normative):**
|
|
1451
|
+
- `docs/plans/workflow-v2-roadmap.md`
|
|
1452
|
+
|
|
1453
|
+
## 16.2) Generation + verifier contract (anti-drift) (locked intent)
|
|
1454
|
+
To prevent v1-style schema/description drift, v2 requires a deterministic generator + verifier pipeline.
|
|
1455
|
+
|
|
1456
|
+
Locks:
|
|
1457
|
+
- Canonical source of truth is **code-canonical** (TypeScript domain types + Zod schemas).
|
|
1458
|
+
- Any “tool reference” docs or registries are **generated artifacts**; they must not be hand-edited.
|
|
1459
|
+
- The verifier regenerates to a temp location and diffs byte-for-byte; it fails fast with an actionable error when outputs are out of date.
|
|
1460
|
+
- Generated outputs MUST be deterministic:
|
|
1461
|
+
- stable ordering
|
|
1462
|
+
- no timestamps
|
|
1463
|
+
- stable formatting
|
|
1464
|
+
|
|
1465
|
+
## 16.3) Closed sets index (v2 minimal) (locked intent)
|
|
1466
|
+
This section is a convenience index for the closed sets already defined elsewhere in this document. It exists to prevent “string bag” drift during implementation.
|
|
1467
|
+
|
|
1468
|
+
- **Preferences**
|
|
1469
|
+
- `autonomy`: `guided | full_auto_stop_on_user_deps | full_auto_never_stop`
|
|
1470
|
+
- `riskPolicy`: `conservative | balanced | aggressive`
|
|
1471
|
+
- **Next intent (boundary discipline)**
|
|
1472
|
+
- `nextIntent`: `perform_pending_then_continue | await_user_confirmation | rehydrate_only | complete`
|
|
1473
|
+
- **Capabilities**: `delegation | web_browsing`
|
|
1474
|
+
- **Edge kind**: `acked_step | checkpoint`
|
|
1475
|
+
- `cause.kind`: `idempotent_replay | intentional_fork | non_tip_advance | checkpoint_created`
|
|
1476
|
+
- **ReasonCode** (semantic source for blockers/gaps): see “Unified reason model (blocked ↔ gaps)”
|
|
1477
|
+
- **Blocker codes**: `USER_ONLY_DEPENDENCY | MISSING_REQUIRED_OUTPUT | INVALID_REQUIRED_OUTPUT | REQUIRED_CAPABILITY_UNKNOWN | REQUIRED_CAPABILITY_UNAVAILABLE | INVARIANT_VIOLATION | STORAGE_CORRUPTION_DETECTED`
|
|
1478
|
+
- **Token payload kinds**: `state | ack | checkpoint`
|
|
1479
|
+
- **Token validation errors**: `TOKEN_INVALID_FORMAT | TOKEN_UNSUPPORTED_VERSION | TOKEN_BAD_SIGNATURE | TOKEN_SCOPE_MISMATCH | TOKEN_UNKNOWN_NODE | TOKEN_WORKFLOW_HASH_MISMATCH | TOKEN_SESSION_LOCKED`
|
|
1480
|
+
- **Manifest record kinds**: `segment_closed | snapshot_pinned`
|
|
1481
|
+
- **Run status**: `in_progress | blocked | complete | complete_with_gaps`
|
|
1482
|
+
|
|
1483
|
+
## 16.3.1) Context budget and schema discipline (locked)
|
|
1484
|
+
|
|
1485
|
+
`context` exists to carry **external inputs** (ticket IDs, repo paths, workflow parameters). It is not durable memory and must not be treated as a “payload bag” for large documents.
|
|
1486
|
+
|
|
1487
|
+
Locks:
|
|
1488
|
+
- **No echo**: execution responses MUST NOT echo the caller’s `context` back verbatim (avoid payload bloat and accidental “send it back” loops).
|
|
1489
|
+
- **No durability**: `context` is not persisted as durable truth (use `output` for durable memory).
|
|
1490
|
+
- **JSON-only**: `context` must be JSON-serializable (objects/arrays/primitives only; no functions, symbols, `undefined`, circular refs).
|
|
1491
|
+
- **Byte budget (fail fast; no silent truncation)**:
|
|
1492
|
+
- WorkRail MUST compute context size as UTF-8 bytes of **RFC 8785 (JCS)** canonical JSON for the provided `context`.
|
|
1493
|
+
- If the canonicalization fails or size exceeds **256KB**, the tool MUST fail fast with errors-as-data.
|
|
1494
|
+
- On MCP tool calls (`start_workflow`, `continue_workflow`), WorkRail MUST fail fast with a **tool error** (code `VALIDATION_ERROR`) that includes: measured bytes, max bytes (256KB), and measurement method (RFC 8785 / JCS UTF-8 bytes), plus concrete reduction guidance (remove blobs; pass references).
|
|
1495
|
+
- **Schema discipline**:
|
|
1496
|
+
- v2 does not support workflow-authored arbitrary context schemas.
|
|
1497
|
+
- If a workflow needs required inputs, it must express this via step instructions + mode behavior (block in blocking modes; assume/skip+disclose in never-stop) rather than relying on implicit context echoing.
|
|
1498
|
+
|
|
1499
|
+
## 16.4) Implementation playbook (how to execute safely) (locked intent)
|
|
1500
|
+
This section records execution guidance for large v2 refactors so we keep the implementation aligned with the locks and avoid mid-project drift.
|
|
1501
|
+
|
|
1502
|
+
### Before starting (what to consider)
|
|
1503
|
+
- Prioritize determinism-critical substrate work first (schemas, hashing, idempotency, storage ordering). Do not build features on unstable foundations.
|
|
1504
|
+
- Maintain strict bounded context boundaries (`src/v2/`) to prevent v2 truth from leaking into v1 mutable session paths.
|
|
1505
|
+
- Keep closed sets explicit and versioned; do not introduce “bags” (strings/booleans) where an enum/union applies.
|
|
1506
|
+
- Treat failure modes (crash/retry/rewind) as first-class requirements; design them upfront.
|
|
1507
|
+
- Enforce anti-drift (generator + verifier) early so docs/schemas cannot diverge.
|
|
1508
|
+
|
|
1509
|
+
### How to split the work (recommended: vertical slices with layer discipline)
|
|
1510
|
+
Build **thin end-to-end paths first**, then expand primitives incrementally. This front-loads integration risk and gives working feedback early while maintaining strict layer boundaries.
|
|
1511
|
+
|
|
1512
|
+
**Layer discipline (enforced throughout)**:
|
|
1513
|
+
- `v2/durable-core/**` stays pure (no Node I/O)
|
|
1514
|
+
- `v2/ports/**` are interfaces only
|
|
1515
|
+
- `v2/infra/**` is the only place Node I/O exists
|
|
1516
|
+
|
|
1517
|
+
**Vertical slices (recommended sequencing)**:
|
|
1518
|
+
|
|
1519
|
+
**Slice 1 — Minimal read-only flow (proves hashing + pinning)**:
|
|
1520
|
+
- Goal: `list_workflows` + `inspect_workflow` work end-to-end from pinned compiled snapshots
|
|
1521
|
+
- Build:
|
|
1522
|
+
- minimal `CompiledWorkflowSnapshotV1` schema (id/name/description only)
|
|
1523
|
+
- JCS canonicalization + `workflowHash` computation (pure)
|
|
1524
|
+
- pinned workflow CAS store (port + adapter)
|
|
1525
|
+
- minimal MCP handlers (read-only)
|
|
1526
|
+
- Why first: validates hashing/pinning compose correctly before storage complexity
|
|
1527
|
+
|
|
1528
|
+
**Slice 2 — Append-only substrate + projections (proves durable truth)**:
|
|
1529
|
+
- Goal: session event log segments + `manifest.jsonl` + pure projections working end-to-end
|
|
1530
|
+
- Build:
|
|
1531
|
+
- session event log (write + load): `session_created`, `run_started`, minimal `node_created`/etc.
|
|
1532
|
+
- minimal segment + manifest with pin-after-close (single-event segments initially)
|
|
1533
|
+
- typed event schema union (locked closed set)
|
|
1534
|
+
- pure deterministic projections (run DAG, session health, outputs, capabilities, gaps, advance outcomes, preferences)
|
|
1535
|
+
- Why second: validates append-only truth + corruption gating + projection correctness before tokens
|
|
1536
|
+
|
|
1537
|
+
**Slice 2.5 — Execution safety boundaries (prep for Slice 3)**:
|
|
1538
|
+
- Goal: gate+witness + readonly/append separation + corruption union so Slice 3 cannot violate purity
|
|
1539
|
+
- Build:
|
|
1540
|
+
- `ExecutionSessionGateV2` (lock+health choke-point)
|
|
1541
|
+
- `WithHealthySessionLock` (opaque branded witness; append requires proof)
|
|
1542
|
+
- readonly vs append port split
|
|
1543
|
+
- `SessionHealthV2` union (`healthy | corrupt_tail | corrupt_head | unknown_version`) with manifest-attested reasons
|
|
1544
|
+
- typed snapshot pin enforcement (no `any`)
|
|
1545
|
+
- Why 2.5: makes rehydrate-purity + replay-no-recompute architecturally enforceable before orchestration
|
|
1546
|
+
|
|
1547
|
+
**Slice 3 — Token orchestration (start/continue/rehydrate/replay)**:
|
|
1548
|
+
- Goal: `start_workflow` + `continue_workflow` working end-to-end; rehydrate is pure; replay is idempotent
|
|
1549
|
+
- Prerequisites (code-locked before starting Slice 3; see readiness audit below):
|
|
1550
|
+
- execution snapshot schema (`ExecutionSnapshotFileV1`, `EnginePayloadV1`)
|
|
1551
|
+
- snapshot CAS port + adapter skeleton
|
|
1552
|
+
- token payload codec (pure, no signer yet)
|
|
1553
|
+
- event union audit (confirm completeness)
|
|
1554
|
+
- snapshot-state helpers (`deriveIsComplete`, `derivePendingStep`)
|
|
1555
|
+
- Build:
|
|
1556
|
+
- token signing + validation (HMAC-SHA256; keyring port + adapter)
|
|
1557
|
+
- rehydrate use-case (readonly-only; pure)
|
|
1558
|
+
- advance use-case (append-capable; requires witness)
|
|
1559
|
+
- replay use-case (fact-returning; fail-closed)
|
|
1560
|
+
- MCP handlers (`start_workflow`, `continue_workflow`)
|
|
1561
|
+
- Why third: proves token/snapshot/replay correctness before modes/preferences/blockers
|
|
1562
|
+
|
|
1563
|
+
**Slice 4+ — Blocked + gaps, export/import, resume**:
|
|
1564
|
+
To keep scope coherent and reduce drift risk, treat Slice 4+ as three sub-slices with explicit gates:
|
|
1565
|
+
|
|
1566
|
+
- **Slice 4a — Semantics lockdown (blocked ↔ gaps, prefs/modes, contracts)**:
|
|
1567
|
+
- build modes/preferences + output contracts + blocked/gap behavior (table-driven; no parallel “reason models”)
|
|
1568
|
+
- reconcile canonical docs so the contract points to these locks (avoid “open items” drift)
|
|
1569
|
+
- **Lock `notesMarkdown` accumulation semantics** (see Section 18.1 — must decide: per-step fresh vs cumulative)
|
|
1570
|
+
- **Slice 4b — Portability (Gate 4)**:
|
|
1571
|
+
- build export bundle integrity + import validation + token re-minting
|
|
1572
|
+
- add export→import equivalence tests over projections (excluding runtime-only tokens/timestamps)
|
|
1573
|
+
- **Slice 4c — Resumption + checkpoints**:
|
|
1574
|
+
- build `resume_session` with locked ranking/matching + budgets (healthy-only)
|
|
1575
|
+
- build `checkpoint_workflow` (idempotent via `checkpointToken`)
|
|
1576
|
+
|
|
1577
|
+
**Alternative (horizontal phases, safer for inexperienced teams)**:
|
|
1578
|
+
If vertical slices feel too risky, use the original horizontal sequencing:
|
|
1579
|
+
1. Phase 0 (canonical core, no I/O)
|
|
1580
|
+
2. Phase 1 (pure projections)
|
|
1581
|
+
3. Phase 2 (storage substrate)
|
|
1582
|
+
4. Phase 3 (protocol orchestration)
|
|
1583
|
+
5. Phase 4 (determinism suite)
|
|
1584
|
+
|
|
1585
|
+
Trade-off: slower feedback but less risk of "cut corners to integrate early."
|
|
1586
|
+
|
|
1587
|
+
### Slice N+1 readiness audit (locked; required before complex integration slices)
|
|
1588
|
+
Before starting a complex integration slice (like Slice 3: orchestration), run this explicit checklist to verify prerequisite boundary schemas/ports/codecs exist. Failing this audit forces mid-slice refactors and risks drift.
|
|
1589
|
+
|
|
1590
|
+
**Checklist** (customize per slice):
|
|
1591
|
+
- [ ] All required schemas exist in code (Zod + TS types) and are locked/versioned.
|
|
1592
|
+
- [ ] All required ports exist (interfaces) and are well-typed (no base-type bags).
|
|
1593
|
+
- [ ] Minimal adapters/skeletons exist for critical I/O paths.
|
|
1594
|
+
- [ ] Golden fixtures exist for canonicalization/hashing (if applicable).
|
|
1595
|
+
- [ ] Pure helpers/projections needed for orchestration exist and are testable.
|
|
1596
|
+
|
|
1597
|
+
**Example: Slice 3 readiness audit** (per resumption pack):
|
|
1598
|
+
- [ ] Execution snapshot schema (`ExecutionSnapshotFileV1`, `EnginePayloadV1`) exists + golden JCS fixtures.
|
|
1599
|
+
- [ ] Snapshot CAS port (`SnapshotStorePortV2`) exists.
|
|
1600
|
+
- [ ] Snapshot CAS adapter skeleton exists (`put`/`get` working).
|
|
1601
|
+
- [ ] Token payload codec exists (pure encode/decode; signing is later).
|
|
1602
|
+
- [ ] Event union audited (all locked kinds for Slice 3 are present and typed).
|
|
1603
|
+
- [ ] Snapshot-state helpers exist (`deriveIsComplete`, `derivePendingStep`).
|
|
1604
|
+
|
|
1605
|
+
### Quality gates (how we know each phase is “done”)
|
|
1606
|
+
- **Gate 0 (schemas + hashing)**: all artifacts validate; generator/verifier passes; golden hash fixtures stable.
|
|
1607
|
+
- **Gate 1 (projections)**: projections are pure and deterministic; ordering/truncation rules are tested.
|
|
1608
|
+
- **Gate 2 (storage)**: crash-safety invariants hold; single-writer enforcement works; no salvage guessing.
|
|
1609
|
+
- **Gate 2.5 (execution safety)**: append requires witness; rehydrate cannot access append ports; health gating works.
|
|
1610
|
+
- **Gate 3 (protocol)**: rehydrate is read-only; ack advancement is idempotent; replay is fact-returning; forks behave as locked.
|
|
1611
|
+
- **Gate 4 (portability)**: export/import integrity passes; tokens re-mint deterministically; projections are equivalent post-import.
|
|
1612
|
+
|
|
1613
|
+
### Handling issues mid-implementation
|
|
1614
|
+
- If an invariant mismatch appears, stop and fix the model/lock explicitly; do not add compatibility patches that expand surface area.
|
|
1615
|
+
- Keep commits small and layered (avoid mixing schema changes with storage changes in one commit).
|
|
1616
|
+
- Prefer recording bounded, typed trace data (events) over ad-hoc debugging paths when explainability is required.
|
|
1617
|
+
- When a lock conflicts with reality, make an explicit decision to amend the lock or change approach; never silently diverge.
|
|
1618
|
+
|
|
1619
|
+
## 16.5) Polish & Hardening Phase (locked; required before "v2 production-ready")
|
|
1620
|
+
|
|
1621
|
+
This phase is **cross-cutting quality work** (not a functional slice). It touches all prior slices to raise code quality, maintainability, and anti-drift enforcement before declaring v2 production-ready.
|
|
1622
|
+
|
|
1623
|
+
### When to run this phase
|
|
1624
|
+
- After all functional slices ship (Slices 1–6+).
|
|
1625
|
+
- Before v2 unflag / public rollout.
|
|
1626
|
+
- Can be done in sub-phases (separate PRs) or as one cleanup pass.
|
|
1627
|
+
|
|
1628
|
+
### Sub-phase A: Extract constants + remove dead code
|
|
1629
|
+
- **Magic constants → config/constants module**:
|
|
1630
|
+
- Hard-coded budgets/thresholds (e.g., `maxBlockers: 10`, `maxNotesBytes: 4096`, `maxTraceEntries: 25`, `defaultRetryAfterMs: 1000`) → `src/v2/durable-core/constants.ts`
|
|
1631
|
+
- Each constant must have a KDoc explaining why the limit exists and referencing the lock doc section
|
|
1632
|
+
- **Repetitive error messages → builders/templates**:
|
|
1633
|
+
- Extract common error message patterns into pure helper functions
|
|
1634
|
+
- **Hard-coded regex → named constants**:
|
|
1635
|
+
- E.g., `STEP_ID_PATTERN`, `SHA256_DIGEST_PATTERN` with comments
|
|
1636
|
+
- **Remove dead code**:
|
|
1637
|
+
- Unused `as any` casts after schema tightening
|
|
1638
|
+
- Commented-out code blocks
|
|
1639
|
+
- Unused helper functions or types
|
|
1640
|
+
- Redundant type assertions
|
|
1641
|
+
|
|
1642
|
+
### Sub-phase B: Naming & organization consistency
|
|
1643
|
+
- **Naming conventions** (pick one and enforce):
|
|
1644
|
+
- Ensure all v2 classes/functions/types use consistent `V2` suffixing
|
|
1645
|
+
- Port naming: `*PortV2` (consistent across all ports)
|
|
1646
|
+
- Error types: `*ErrorV2` or `*Error` (pick one)
|
|
1647
|
+
- **File organization**:
|
|
1648
|
+
- Verify similar abstractions live in similar places (all ports in `ports/`, all adapters in `infra/local/`, all projections in `projections/`)
|
|
1649
|
+
- No "misc" or "utils" dumping grounds
|
|
1650
|
+
- **Import ordering**:
|
|
1651
|
+
- Alphabetize or group by layer (types → ports → infra → external)
|
|
1652
|
+
- Use consistent import style (named vs default)
|
|
1653
|
+
|
|
1654
|
+
### Sub-phase C: Documentation completeness (code-level KDoc)
|
|
1655
|
+
- **Every port interface** has KDoc explaining:
|
|
1656
|
+
- Purpose and locked invariants
|
|
1657
|
+
- When/how it should be used
|
|
1658
|
+
- What guarantees it provides (e.g., "idempotent", "pure", "crash-safe")
|
|
1659
|
+
- **Every branded type** has a comment explaining:
|
|
1660
|
+
- What footgun it prevents
|
|
1661
|
+
- How to construct it safely
|
|
1662
|
+
- **Every closed-set enum/union** has a comment:
|
|
1663
|
+
- Referencing the lock doc section
|
|
1664
|
+
- Explaining why it's closed (what drift it prevents)
|
|
1665
|
+
- **Complex pure functions** have KDoc with:
|
|
1666
|
+
- Examples or edge cases
|
|
1667
|
+
- Performance characteristics if relevant
|
|
1668
|
+
- References to lock doc sections
|
|
1669
|
+
|
|
1670
|
+
### Sub-phase D: Anti-drift enforcement (build-time guards)
|
|
1671
|
+
- **Forbidden import graph tests**:
|
|
1672
|
+
- `durable-core/**` must not import from `infra/**` or Node modules (`fs`, `crypto`, `path`)
|
|
1673
|
+
- `projections/**` must not import MCP wiring
|
|
1674
|
+
- MCP handlers must not import projections directly (only via use-cases)
|
|
1675
|
+
- **Exact MCP tool registry snapshot test**:
|
|
1676
|
+
- Assert the exposed tool set is exactly the locked list (core + flagged)
|
|
1677
|
+
- Prevent accidental "projection MCP tools" from being added
|
|
1678
|
+
- **Generator/verifier in CI** (three targets):
|
|
1679
|
+
- **MCP tool schemas/descriptions**: generate JSON Schemas from code-canonical Zod definitions; verifier diffs and fails if out of sync
|
|
1680
|
+
- **Builtins registry**: generate Studio-ready metadata (templates/features/contracts/capabilities/refs) from compiler canonical definitions
|
|
1681
|
+
- **Durable store schemas**: generate reference docs for events/manifest/snapshots/bundles from Zod schemas
|
|
1682
|
+
- All generators run on every commit; fail fast with actionable diff if out of sync
|
|
1683
|
+
- Enforce deterministic output (no timestamps, stable ordering)
|
|
1684
|
+
- **CI workflow validation includes v2 tools**:
|
|
1685
|
+
- Extend `scripts/validate-workflows.sh` or add separate v2 tool validation script
|
|
1686
|
+
- Validate v2 MCP tool schemas match code-canonical definitions (invoke generator in verify mode)
|
|
1687
|
+
- Validate v2 tool descriptions are non-empty and reference current tool names (not v1 `workflow_next`)
|
|
1688
|
+
- Run as part of CI `validate-workflows` job or as new CI job (`.github/workflows/ci.yml`)
|
|
1689
|
+
|
|
1690
|
+
### Sub-phase E: Test coverage gaps (non-functional but high-signal)
|
|
1691
|
+
- **Contract tests for v2 MCP tools** (add to `tests/contract/`):
|
|
1692
|
+
- `start_workflow` returns expected response shape with valid tokens
|
|
1693
|
+
- `continue_workflow` (rehydrate-only) is pure and idempotent
|
|
1694
|
+
- `continue_workflow` (with ackToken) advances and returns new tokens
|
|
1695
|
+
- `inspect_workflow` returns compiled snapshot with stable workflowHash
|
|
1696
|
+
- Response schemas match generated JSON Schemas (verifier ensures this)
|
|
1697
|
+
- **Property-based tests** for deterministic helpers:
|
|
1698
|
+
- JCS canonicalization produces stable bytes for equivalent objects
|
|
1699
|
+
- StepInstanceKey formatting roundtrips correctly
|
|
1700
|
+
- Token payload encode/decode is bijective
|
|
1701
|
+
- **Negative path coverage**:
|
|
1702
|
+
- Every error code in error unions has at least one test producing it
|
|
1703
|
+
- Every corruption reason has a test case
|
|
1704
|
+
- **Boundary value tests**:
|
|
1705
|
+
- Empty arrays, max budgets reached, edge cases for sorting/truncation/deduplication
|
|
1706
|
+
- **Idempotency/determinism stress tests** (the "no excuses" suite):
|
|
1707
|
+
- Replay harness: same operation replayed 100x yields byte-identical results
|
|
1708
|
+
- Fork harness: N different `attemptId`s from same node create N distinct branches
|
|
1709
|
+
- Export/import roundtrip: projections are equivalent post-import
|
|
1710
|
+
- Golden hash stability: workflowHash + token payloads + bundle integrity don't drift
|
|
1711
|
+
|
|
1712
|
+
### Sub-phase F: Error ergonomics polish
|
|
1713
|
+
- **Error message quality**:
|
|
1714
|
+
- Every error includes: what went wrong, why it's a problem, what to do next
|
|
1715
|
+
- Retry guidance is explicit and bounded (not vague)
|
|
1716
|
+
- Structured error payloads include actionable hints (e.g., example next-input for blocked states)
|
|
1717
|
+
- **Retry union correctness**:
|
|
1718
|
+
- Verify all `retry` unions are set correctly (not defaulting to `not_retryable` when retry is safe)
|
|
1719
|
+
- Lock-busy errors must include explicit retry timing and "if this persists" guidance
|
|
1720
|
+
|
|
1721
|
+
### Sub-phase G: Performance observability (not optimization)
|
|
1722
|
+
- **Optional trace/timing hooks** (off by default; enabled for debugging):
|
|
1723
|
+
- Add minimal hooks in hot paths (projections, event loading, I/O) so you can profile bottlenecks later
|
|
1724
|
+
- Use a typed `TracePort` (not `console.log`)
|
|
1725
|
+
- **Algorithm audit**:
|
|
1726
|
+
- Ensure no O(n²) in hot paths (projections, event loading)
|
|
1727
|
+
- No accidental repeated file reads (e.g., loading the same snapshot multiple times)
|
|
1728
|
+
- **Projection budget enforcement**:
|
|
1729
|
+
- Verify bounded projections (recap, truncation) actually respect budgets in worst-case scenarios
|
|
1730
|
+
|
|
1731
|
+
### Sub-phase H: v1/v2 firewall (deprecation clarity)
|
|
1732
|
+
- **Boundary tests**:
|
|
1733
|
+
- v2 code must not import from v1 session/dashboard paths
|
|
1734
|
+
- v2 must not leak into v1 mutable session world
|
|
1735
|
+
- **Deprecation markers**:
|
|
1736
|
+
- Add explicit warnings in v1 code pointing to v2 equivalents
|
|
1737
|
+
- Ensure v2 feature flag gating is clean (no "half v2" states where some tools are v2 and others are v1)
|
|
1738
|
+
|
|
1739
|
+
---
|
|
1740
|
+
|
|
1741
|
+
## 9) Authoring ergonomics locks (initial v2)
|
|
1742
|
+
These locks exist to keep authoring low-friction without compromising determinism, type-safety, or drift prevention.
|
|
1743
|
+
|
|
1744
|
+
### IDs and validation (locked)
|
|
1745
|
+
- Workflow IDs follow the namespaced format (see section 7).
|
|
1746
|
+
- Step and loop identifiers that participate in execution state MUST be delimiter-safe:
|
|
1747
|
+
- `step.id` and `loopId` MUST match `[a-z0-9_-]+`
|
|
1748
|
+
- disallow `@`, `/`, `:` to keep `StepInstanceKey` unambiguous without escaping logic.
|
|
1749
|
+
- Studio/CLI validation must fail fast with actionable errors and offer deterministic auto-fix suggestions.
|
|
1750
|
+
|
|
1751
|
+
### Builtins discoverability (locked)
|
|
1752
|
+
- Templates/features/contract packs/capabilities are WorkRail-owned closed sets.
|
|
1753
|
+
- Studio’s Builtins Catalog is generated from the same canonical definitions used by the compiler (never hand-maintained).
|
|
1754
|
+
- Authoring UX must not require “secret menu knowledge”: autocomplete + insert actions are first-class.
|
|
1755
|
+
|
|
1756
|
+
### Prompt references / inline canonical injections (initial v2, locked)
|
|
1757
|
+
Workflows may reference WorkRail-owned canonical information *inline* in prompts (e.g., “WorkRail v2 definition”, “append-only truth”, “modes semantics”) without copy/paste.
|
|
1758
|
+
|
|
1759
|
+
Locks:
|
|
1760
|
+
- **Compile-time only**: reference resolution happens only during workflow compilation. Runtime does not “look up” refs.
|
|
1761
|
+
- **Closed set**: referenced snippets use WorkRail-owned IDs in the reserved namespace: `wr.refs.*` (no author-defined arbitrary include paths).
|
|
1762
|
+
- **No string templating**: do not support `{{ }}` interpolation, file-path includes, or URL includes. References must be typed and validated.
|
|
1763
|
+
- **Hashing (locked choice)**: the compiled workflow snapshot MUST embed the **fully resolved reference text** for every `wr.refs.*` usage (not just `{refId, refContentHash}`). This embedded text is part of the hashed compiled snapshot and therefore influences `workflowHash` (pinned determinism).
|
|
1764
|
+
- **Export/import implication**: because refs are embedded, resumable bundles do not need any additional “ref registry snapshot” concept; the pinned compiled workflow snapshot remains self-contained.
|
|
1765
|
+
- **Budgets**:
|
|
1766
|
+
- per referenced snippet max bytes (compiler enforced)
|
|
1767
|
+
- per step injected bytes cap (compiler enforced)
|
|
1768
|
+
- on violation: fail validation with errors-as-data (no silent truncation).
|
|
1769
|
+
- **Allowed placements** (to keep prompts instruction-first):
|
|
1770
|
+
- references are allowed only within structured prompt sections (`promptBlocks.constraints`, `promptBlocks.procedure`, `promptBlocks.verify`, and optionally `promptBlocks.goal`).
|
|
1771
|
+
- references are disallowed inside arbitrary free-form prose fields where they would become unreadable walls of text.
|
|
1772
|
+
- **Provenance**: compiled steps must retain provenance for injected content (at minimum: `refId` + `refContentHash` + byte counts) so Studio can render Source vs Compiled and explain “where this text came from”.
|
|
1773
|
+
|
|
1774
|
+
Design intent:
|
|
1775
|
+
- Treat `wr.refs.*` as a builtin kind (like templates/features/contracts) and power it via a generated registry from canonical compiler definitions (anti-drift).
|
|
1776
|
+
|
|
1777
|
+
### Capabilities + contracts ergonomics (locked intent)
|
|
1778
|
+
- Capability probes for `required` capabilities should be compiler-injected (collapsed by default) and recorded durably with strong provenance.
|
|
1779
|
+
- Structured outputs require explicit contracts (no inline schema authoring); templates may imply `contractRef` to reduce author burden.
|
|
1780
|
+
|
|
1781
|
+
#### Workflow-authored output schemas (rejected for v2; locked)
|
|
1782
|
+
Workflows often want workflow-specific structured artifacts (tables, findings, comment sets) beyond free-form notes. v2 intentionally **does not** allow workflows to author arbitrary inline JSON schemas (or project-local schema refs) for required outputs.
|
|
1783
|
+
|
|
1784
|
+
Locks:
|
|
1785
|
+
- Output schema authoring is **WorkRail-owned**:
|
|
1786
|
+
- Steps declare requirements only by referencing a WorkRail-owned contract pack via `output.contractRef` (`wr.contracts.*`).
|
|
1787
|
+
- The set of contract packs is a closed set generated from code-canonical definitions (anti-drift).
|
|
1788
|
+
- No workflow-authored schema sources:
|
|
1789
|
+
- Disallow inline schema definitions in workflow JSON.
|
|
1790
|
+
- Disallow “schemaRef” fields that point to project files, git URLs, or external registries.
|
|
1791
|
+
- If a workflow needs richer structured artifacts:
|
|
1792
|
+
- Expand the WorkRail-owned contract pack catalog (preferred), or
|
|
1793
|
+
- fall back to `output.notesMarkdown` (generic durability) until an appropriate pack exists.
|
|
1794
|
+
|
|
1795
|
+
Rationale:
|
|
1796
|
+
- Prevents schema drift and validation inconsistency across MCP/CLI/Studio.
|
|
1797
|
+
- Keeps compilation deterministic and `workflowHash` stable.
|
|
1798
|
+
- Preserves Studio rendering determinism and avoids “arbitrary schema” footguns.
|
|
1799
|
+
|
|
1800
|
+
### Contract pack registry + pinning (locked intent)
|
|
1801
|
+
- Contract packs are WorkRail-owned and generated from canonical definitions (code).
|
|
1802
|
+
- The compiled workflow snapshot MUST embed the resolved contract pack schemas/examples actually used (as part of the hash inputs), so:
|
|
1803
|
+
- long-lived runs remain deterministic even if packs evolve on disk
|
|
1804
|
+
- export/import bundles are self-contained
|
|
1805
|
+
|
|
1806
|
+
### Loops authoring (locked intent)
|
|
1807
|
+
- Loops are authored explicitly as `type: "loop"` steps with:
|
|
1808
|
+
- unique delimiter-safe `loopId`
|
|
1809
|
+
- explicit ordered `body[]` (authoritative for `bodyIndex`)
|
|
1810
|
+
- required `maxIterations` (no defaults)
|
|
1811
|
+
- `while` as `{ kind:"condition_ref", conditionId }` referencing a closed-set condition definition
|
|
1812
|
+
- Condition definitions are a closed set; do not introduce arbitrary expression strings.
|
|
1813
|
+
- Prefer a contract-validated `loop_control` condition kind for real loops, rather than reading arbitrary `context` keys.
|
|
1814
|
+
- `wr.contracts.loop_control` is the initial contract pack for loop exit control:
|
|
1815
|
+
- validates an `output.artifacts[]` entry with `kind="wr.loop_control"` and fields `{ loopId, decision, summary? }`
|
|
1816
|
+
|
|
1817
|
+
Additional locks (prevents silent loop no-op):
|
|
1818
|
+
- **Loop control source (locked):** `while` loop continuation MUST NOT be controlled by mutable ad-hoc `context` keys (e.g., `continuePlanning`) because missing/incorrect agent output can cause the loop body to be skipped without detection.
|
|
1819
|
+
- **Loop control contract (locked):** loop continuation MUST be derived from a contract-validated loop-control artifact/output (e.g., `wr.contracts.loop_control`), produced by an explicit loop decision step.
|
|
1820
|
+
- **Failure mode (locked):** missing/invalid loop-control output MUST NOT be treated as “exit loop” implicitly.
|
|
1821
|
+
- In blocking modes: return a typed blocker (`MISSING_REQUIRED_OUTPUT` / `INVALID_REQUIRED_OUTPUT`) referencing the loop-control contract.
|
|
1822
|
+
- In never-stop mode: record a `gap_recorded` (severity=critical, category=contract_violation) and proceed according to the mode’s semantics.
|
|
1823
|
+
- **Explainability (locked intent):** the loop decision step SHOULD include a bounded summary explaining why the loop continues or exits (stored as part of the loop-control artifact output).
|
|
1824
|
+
|
|
1825
|
+
Loop iteration semantics (locked) (prevents maxIterations/iteration conflicts):
|
|
1826
|
+
- **Iteration indexing (locked):** `loopStack[].iteration` is **0-based**. The first loop iteration is `iteration=0`.
|
|
1827
|
+
- **`maxIterations` meaning (locked):** `maxIterations` is a **count** of allowed loop iterations (not a max index).
|
|
1828
|
+
- Allowed iteration values are: `0..(maxIterations - 1)`.
|
|
1829
|
+
- A loop MUST NOT enter/execute a body iteration when `iteration >= maxIterations`.
|
|
1830
|
+
- **Iteration increment point (locked):** `iteration` increments **only when starting the next loop iteration** (i.e., after completing the loop body for the previous iteration and deciding to continue). It MUST NOT increment mid-body.
|
|
1831
|
+
- **Termination reason (locked intent):** loop termination MUST be attributable to exactly one of:
|
|
1832
|
+
- condition evaluated false, or
|
|
1833
|
+
- max iterations reached (the next would-be iteration would have `iteration == maxIterations`).
|
|
1834
|
+
- **Failure mode (locked):** attempting to continue a loop when `iteration >= maxIterations` MUST fail fast as errors-as-data (no silent stop, no “stuck”):
|
|
1835
|
+
- In blocking modes: return a typed blocker (a dedicated loop-limit code is preferred; `INVARIANT_VIOLATION` is acceptable if no dedicated code exists yet).
|
|
1836
|
+
- In never-stop mode: record a `gap_recorded` (severity=critical, category=unexpected, detail=`invariant_violation`) and proceed according to mode semantics.
|
|
1837
|
+
- The blocker/gap MUST include `loopId`, `iteration`, and `maxIterations` in bounded structured details.
|
|
1838
|
+
|
|
1839
|
+
### Source vs compiled clarity (locked)
|
|
1840
|
+
- Studio must clearly distinguish:
|
|
1841
|
+
- source workflow
|
|
1842
|
+
- compiled workflow (what is hashed)
|
|
1843
|
+
- pinned snapshot used by a run
|
|
1844
|
+
- Compiled view must show injection provenance and “hash inputs” at a glance to reduce surprise and drift.
|
|
1845
|
+
|
|
1846
|
+
---
|
|
1847
|
+
|
|
1848
|
+
## 10) Operational envelope locks (pre-implementation)
|
|
1849
|
+
These locks cover runtime failure modes, rollout posture, and determinism verification. They are intentionally “ops-shaped” but must remain deterministic and drift-proof.
|
|
1850
|
+
|
|
1851
|
+
### Corruption handling (locked)
|
|
1852
|
+
We distinguish between **execution correctness** and **read-only UX**:
|
|
1853
|
+
- Execution paths (token validation, rehydrate/advance, export integrity, etc.) are **strict fail-fast** on corruption relevant to the requested run/node.
|
|
1854
|
+
- Read-only views (Studio/Console inspection and export of valid prefix) may operate in **salvage mode**:
|
|
1855
|
+
- load and render only up to the last valid manifest entry
|
|
1856
|
+
- clearly banner “corrupt tail / partial data” and never claim correctness beyond the validated prefix
|
|
1857
|
+
|
|
1858
|
+
Salvage surface (locked intent):
|
|
1859
|
+
- Salvage mode is **read-only**:
|
|
1860
|
+
- allowed: inspect/export of validated prefix
|
|
1861
|
+
- disallowed: `continue_workflow` advancement from a salvaged tail
|
|
1862
|
+
- `resume_session` must only return candidates from fully validated sessions (no candidates that require corrupted tail data).
|
|
1863
|
+
- Corruption must be surfaced as structured warnings/errors with a closed set of codes (no silent fallback).
|
|
1864
|
+
|
|
1865
|
+
### Session health + tool gating (locked)
|
|
1866
|
+
To prevent accidentally using salvage reads as an execution correctness path, v2 defines a closed-set session health classification and gates tools accordingly.
|
|
1867
|
+
|
|
1868
|
+
`SessionHealth` (closed set, initial):
|
|
1869
|
+
- `healthy`
|
|
1870
|
+
- `corrupt_tail` (validated prefix available)
|
|
1871
|
+
- `corrupt_head` (no usable prefix)
|
|
1872
|
+
- `unknown_version`
|
|
1873
|
+
|
|
1874
|
+
Tool gating (locked intent):
|
|
1875
|
+
- Execution correctness tools MUST require `SessionHealth=healthy` for the target session/run/node:
|
|
1876
|
+
- `continue_workflow` advancement (with `ackToken`)
|
|
1877
|
+
- `checkpoint_workflow`
|
|
1878
|
+
- token minting/advancement paths that depend on durable truth for correctness
|
|
1879
|
+
- Read-only tooling (Studio/Console inspection and export) MAY operate on `SessionHealth=corrupt_tail` using only the validated prefix, but MUST:
|
|
1880
|
+
- set an explicit salvage/banner flag in responses/exports (no silent partial data)
|
|
1881
|
+
- forbid any advancement/mutation from salvaged sessions
|
|
1882
|
+
|
|
1883
|
+
### Token signing key management (locked intent)
|
|
1884
|
+
- Use a local **key ring file** containing a small active set (current + previous).
|
|
1885
|
+
- Rotation is explicit (manual / controlled), not time-driven.
|
|
1886
|
+
- Validation accepts tokens signed by any active key; tokens are not exported/imported.
|
|
1887
|
+
|
|
1888
|
+
Key ring storage (locked intent):
|
|
1889
|
+
- Store the key ring in the WorkRail local data directory (WorkRail-owned, not user-authored workflow dirs).
|
|
1890
|
+
- File must be readable/writable only by the current user (best-effort; platform-specific).
|
|
1891
|
+
- Keep exactly two active keys: `current` and `previous`.
|
|
1892
|
+
- Rotation semantics:
|
|
1893
|
+
- on rotation, `current` becomes `previous` and a fresh `current` key is generated.
|
|
1894
|
+
- tokens signed by `previous` remain valid until the next rotation.
|
|
1895
|
+
|
|
1896
|
+
### v1 coexistence and deprecation posture (locked intent)
|
|
1897
|
+
- v2 is the correctness model; v1-style mutable session tools are not part of v2 truth.
|
|
1898
|
+
- If v1 session tools exist for legacy reasons, keep them behind an explicit feature flag during rollout.
|
|
1899
|
+
- No migration story is required: legacy sessions are not treated as durable truth.
|
|
1900
|
+
|
|
1901
|
+
### Projections + indexing (locked intent)
|
|
1902
|
+
- Studio-facing projections may be cached as a **derived, rebuildable** per-session index.
|
|
1903
|
+
- The cache is never truth: it is safe to delete and deterministically rebuild from the append-only store.
|
|
1904
|
+
|
|
1905
|
+
Projection cache invariants (locked intent):
|
|
1906
|
+
- Cache format is versioned and includes the last processed `EventIndex`/`ManifestIndex` so rebuild/incremental update is deterministic.
|
|
1907
|
+
- Any cache schema version mismatch or corruption causes the cache to be discarded and rebuilt (safe fallback).
|
|
1908
|
+
- Cache must not include any data that would change correctness (e.g., it must not override preferred tip policy).
|
|
1909
|
+
|
|
1910
|
+
### Determinism verification suite (locked intent)
|
|
1911
|
+
Provide a minimal “no excuses” suite that asserts v2 guarantees:
|
|
1912
|
+
- golden fixtures for `workflowHash` (compiled snapshot → JCS bytes → sha256)
|
|
1913
|
+
- replay harness for idempotency and branching (replay tool calls / ack attempts yields identical durable truth)
|
|
1914
|
+
- export/import roundtrip tests (bundle integrity + token re-mint + projection equivalence)
|
|
1915
|
+
|
|
1916
|
+
### Rollout flags: unflag criteria (locked intent)
|
|
1917
|
+
Some tools begin feature-flagged to reduce rollout risk. Unflagging MUST be evidence-based and driven by deterministic quality gates (not “it feels stable”).
|
|
1918
|
+
|
|
1919
|
+
#### `resume_session` unflag gates
|
|
1920
|
+
- Session health gating is implemented: only `SessionHealth=healthy` sessions are eligible candidates.
|
|
1921
|
+
- Deterministic ranking/matching is implemented exactly as locked (tiers + normalization + bounded snippets).
|
|
1922
|
+
- Export/import roundtrip preserves resume results (post-import candidate set and ordering is equivalent for the same store contents).
|
|
1923
|
+
- Corruption/salvage behavior is correct: corrupt sessions do not appear as candidates; errors are structured.
|
|
1924
|
+
|
|
1925
|
+
#### `checkpoint_workflow` unflag gates
|
|
1926
|
+
- Idempotency is enforced via `checkpointToken` (replay-safe; no duplicate checkpoint nodes/edges/outputs).
|
|
1927
|
+
- Checkpoint edges are recorded (`edgeKind=checkpoint`, `cause.kind=checkpoint_created`) and visible in projections.
|
|
1928
|
+
- Rehydrate-only remains side-effect-free and cannot accidentally create checkpoint artifacts.
|
|
1929
|
+
- Export/import roundtrip preserves checkpoint nodes/edges/outputs and re-mints `checkpointToken` deterministically.
|
|
1930
|
+
|
|
1931
|
+
### Notable refinement ideas (deferred; non-blocking)
|
|
1932
|
+
These emerged during Slice 2.5 risk analysis and Polish & Hardening ideation but are not required for v2 correctness. They may be valuable later enhancements:
|
|
1933
|
+
|
|
1934
|
+
- **Linear append transaction primitive**: make `AppendPlan` one-shot/consumed (prevents accidental partial appends or stale plan reuse). This would require wrapping the plan in a linear capability token that is invalidated after append.
|
|
1935
|
+
- **Pure deterministic renderer with fixtures**: for response text (recap/pending prompts). Keeps replay deterministic without storing full responses in durable truth. Renderer version is internal-only (not exposed to MCP).
|
|
1936
|
+
- **Determinism diff tool**: for debugging; given two runs/bundles, compute canonical JCS diffs of key artifacts (events, snapshots, manifests) to localize drift. Useful for troubleshooting "why did replay produce different bytes?"
|
|
1937
|
+
|
|
1938
|
+
---
|
|
1939
|
+
|
|
1940
|
+
## 17) Implementation architecture map (locked)
|
|
1941
|
+
WorkRail v2 implementation MUST use a coherent, compositional architecture so components integrate without drift.
|
|
1942
|
+
|
|
1943
|
+
### Unifying style (locked)
|
|
1944
|
+
- **Functional core / imperative shell**:
|
|
1945
|
+
- pure functions for determinism-critical logic (schemas/normalization, JCS+hashing, compilation, projections, idempotency decisions)
|
|
1946
|
+
- small adapters for side effects (filesystem, locks, keyring IO, crypto primitives)
|
|
1947
|
+
- **Ports & adapters** (Clean Architecture):
|
|
1948
|
+
- use-cases orchestrate ports; MCP handlers remain thin mappers
|
|
1949
|
+
- **Errors as data**:
|
|
1950
|
+
- typed, closed-set error codes; `Result`-style flow; no throwing across MCP boundaries
|
|
1951
|
+
|
|
1952
|
+
### Side effects live at the edges (locked)
|
|
1953
|
+
All non-pure operations MUST be isolated behind ports/adapters and kept out of the functional core:
|
|
1954
|
+
- filesystem reads/writes (segments, manifest, CAS snapshots, pinned workflows, bundles)
|
|
1955
|
+
- file locks
|
|
1956
|
+
- keyring IO
|
|
1957
|
+
- crypto primitives (HMAC/sign/verify)
|
|
1958
|
+
- clocks (timestamps are informational only; never used for ordering or tie-breaking)
|
|
1959
|
+
|
|
1960
|
+
### Canonical “core” modules (must exist)
|
|
1961
|
+
- **Canonical models**: branded IDs + discriminated unions + Zod schemas (code-canonical source of truth)
|
|
1962
|
+
- **Canonicalization + hashing**: RFC 8785 (JCS) serializer + SHA-256 wrapper; hashing only accepts typed canonical artifacts
|
|
1963
|
+
- **Compiler**: pure pipeline producing `CompiledWorkflowSnapshotV1` with deterministic ordering + provenance
|
|
1964
|
+
- **Projections** (internal-only): pure read-model reducers/policies (preferred tip, run status, resume ranking, current outputs/gaps), bounded + salvage-aware
|
|
1965
|
+
|
|
1966
|
+
### Persistence ports (must exist)
|
|
1967
|
+
- `SessionStorePort`: append/load domain events + manifest (session-scoped locking enforced here)
|
|
1968
|
+
- `SnapshotStorePort`: CAS get/put by `SnapshotRef` (immutable)
|
|
1969
|
+
- `PinnedWorkflowStorePort`: store/load `CompiledWorkflowSnapshotV1` keyed by `workflowHash`
|
|
1970
|
+
- `ProjectionCachePort`: derived, rebuildable cache (versioned; never truth)
|
|
1971
|
+
- `KeyRingPort`: current/previous keys + rotation
|
|
1972
|
+
|
|
1973
|
+
### Directory/package layout (Phase 0–2, locked)
|
|
1974
|
+
Concrete structure under `src/v2/` bounded context (updated v1.5.1):
|
|
1975
|
+
|
|
1976
|
+
```
|
|
1977
|
+
src/
|
|
1978
|
+
v2/
|
|
1979
|
+
durable-core/
|
|
1980
|
+
ids/
|
|
1981
|
+
index.ts # barrel: re-exports all ID modules
|
|
1982
|
+
session-ids.ts # SessionId, RunId, NodeId, AttemptId (v1.5.1)
|
|
1983
|
+
workflow-ids.ts # WorkflowId, WorkflowHash, WorkflowHashRef, Sha256Digest (v1.5.1)
|
|
1984
|
+
event-ids.ts # EventId, EventIndex, ManifestIndex, OutputId (v1.5.1)
|
|
1985
|
+
snapshot-ids.ts # SnapshotRef, CanonicalBytes (v1.5.1)
|
|
1986
|
+
token-ids.ts # TokenStringV1 (v1.5.1)
|
|
1987
|
+
lib/
|
|
1988
|
+
utf8-byte-length.ts # shared UTF-8 byte measurement utility (v1.5.1)
|
|
1989
|
+
errors/
|
|
1990
|
+
index.ts # exports unified error envelope + closed code unions
|
|
1991
|
+
canonical/
|
|
1992
|
+
jcs.ts # RFC 8785 canonicalizer (uses CryptoPort)
|
|
1993
|
+
hashing.ts # sha256 wrapper + typed hash helpers
|
|
1994
|
+
schemas/
|
|
1995
|
+
session/
|
|
1996
|
+
events.ts # DomainEventV1 discriminated union (v1.5.1)
|
|
1997
|
+
blockers.ts # Blocker schemas (extracted v1.5.1)
|
|
1998
|
+
outputs.ts # Output payload schemas (extracted v1.5.1)
|
|
1999
|
+
gaps.ts # Gap schemas (extracted v1.5.1)
|
|
2000
|
+
dag-topology.ts # Node/edge schemas (extracted v1.5.1)
|
|
2001
|
+
index.ts # barrel re-exports
|
|
2002
|
+
execution-snapshot/
|
|
2003
|
+
index.ts # ExecutionSnapshotFile + enginePayload.v1 Zod + types
|
|
2004
|
+
compiled-workflow/
|
|
2005
|
+
index.ts # CompiledWorkflowSnapshotV1 Zod + types
|
|
2006
|
+
export-bundle/
|
|
2007
|
+
index.ts # Bundle Zod + types
|
|
2008
|
+
projections/
|
|
2009
|
+
projection-error.ts # shared ProjectionError type (v1.5.1)
|
|
2010
|
+
run-dag.ts # projectRunDagV2 with extracted tip algorithm (v1.5.1)
|
|
2011
|
+
node-outputs.ts # projectNodeOutputsV2
|
|
2012
|
+
capabilities.ts # projectCapabilitiesV2
|
|
2013
|
+
gaps.ts # projectGapsV2
|
|
2014
|
+
preferences.ts # projectPreferencesV2
|
|
2015
|
+
artifacts.ts # projectArtifactsV2
|
|
2016
|
+
run-status-signals.ts # projectRunStatusSignalsV2
|
|
2017
|
+
resume-ranking.ts # deterministic resume ranking logic
|
|
2018
|
+
session-health.ts # projectSessionHealthV2
|
|
2019
|
+
domain/
|
|
2020
|
+
# Pure domain logic extracted from handlers (v1.5.1)
|
|
2021
|
+
ack-advance-append-plan.ts # event builders (decomposed v1.5.1)
|
|
2022
|
+
blocking-decision.ts
|
|
2023
|
+
reason-model.ts
|
|
2024
|
+
bundle-builder.ts
|
|
2025
|
+
bundle-validator.ts
|
|
2026
|
+
prompt-renderer.ts
|
|
2027
|
+
# ... other domain modules
|
|
2028
|
+
ports/
|
|
2029
|
+
fs.port.ts # segregated into 5 sub-interfaces (v1.5.1)
|
|
2030
|
+
data-dir.port.ts # uses branded types (v1.5.1)
|
|
2031
|
+
session-event-log-store.port.ts
|
|
2032
|
+
snapshot-store.port.ts
|
|
2033
|
+
pinned-workflow-store.port.ts
|
|
2034
|
+
directory-listing.port.ts # segregated from fs.port (v1.5.1)
|
|
2035
|
+
infra/
|
|
2036
|
+
local/
|
|
2037
|
+
data-dir/
|
|
2038
|
+
index.ts # DataDirPort file implementation
|
|
2039
|
+
session-lock/
|
|
2040
|
+
index.ts # SessionLockPort OS-level lock implementation
|
|
2041
|
+
session-store/
|
|
2042
|
+
index.ts # SessionStorePort file implementation (segments + manifest)
|
|
2043
|
+
snapshot-store/
|
|
2044
|
+
index.ts # SnapshotStorePort CAS file implementation
|
|
2045
|
+
pinned-workflow-store/
|
|
2046
|
+
index.ts # PinnedWorkflowStorePort file implementation
|
|
2047
|
+
directory-listing/
|
|
2048
|
+
index.ts # DirectoryListingPort implementation (v1.5.1)
|
|
2049
|
+
session-summary-provider/
|
|
2050
|
+
index.ts # SessionSummaryProviderPort implementation
|
|
2051
|
+
mcp/
|
|
2052
|
+
handlers/
|
|
2053
|
+
v2-execution/ # modularized (v1.5.1)
|
|
2054
|
+
index.ts # public API exports
|
|
2055
|
+
start.ts # executeStartWorkflow + helpers
|
|
2056
|
+
continue-rehydrate.ts # handleRehydrateIntent
|
|
2057
|
+
continue-advance.ts # handleAdvanceIntent
|
|
2058
|
+
replay.ts # replayFromRecordedAdvance + helpers
|
|
2059
|
+
advance.ts # advanceAndRecord
|
|
2060
|
+
v2-advance-core/ # modularized (v1.5.1)
|
|
2061
|
+
index.ts # executeAdvanceCore orchestrator
|
|
2062
|
+
input-validation.ts # validateAdvanceInputs
|
|
2063
|
+
outcome-blocked.ts # buildBlockedOutcome
|
|
2064
|
+
outcome-success.ts # buildSuccessOutcome
|
|
2065
|
+
event-builders.ts # buildAndAppendPlan + output builders
|
|
2066
|
+
v2-advance-events.ts # extracted event builders (v1.5.1)
|
|
2067
|
+
v2-execution.ts # barrel: re-exports from v2-execution/
|
|
2068
|
+
v2-advance-core.ts # barrel: re-exports from v2-advance-core/
|
|
2069
|
+
v2-checkpoint.ts
|
|
2070
|
+
v2-resume.ts
|
|
2071
|
+
v2-workflow.ts
|
|
2072
|
+
v2-token-ops.ts
|
|
2073
|
+
v2-execution-helpers.ts
|
|
2074
|
+
```
|
|
2075
|
+
|
|
2076
|
+
Lock:
|
|
2077
|
+
- Files under `durable-core/` export only pure functions and types.
|
|
2078
|
+
- Files under `ports/` export only TypeScript interfaces.
|
|
2079
|
+
- Files under `infra/` are the only places Node I/O is allowed.
|
|
2080
|
+
|
|
2081
|
+
### Integration rule (locked)
|
|
2082
|
+
- Studio/Console and CLI MUST consume the internal projections module; do not duplicate projection logic or expose new MCP tools for projections.
|
|
2083
|
+
|
|
2084
|
+
### Dependency layering (locked)
|
|
2085
|
+
To keep the functional core pure and prevent side-effect creep, module imports MUST follow these rules:
|
|
2086
|
+
|
|
2087
|
+
Pure core (no Node I/O or side effects):
|
|
2088
|
+
- `v2/durable-core/ids` → exports branded ID types + smart constructors
|
|
2089
|
+
- `v2/durable-core/errors` → exports unified error envelope + closed code unions
|
|
2090
|
+
- `v2/durable-core/canonical` → exports RFC 8785 (JCS) canonicalizer + SHA-256 helpers (uses `CryptoPort`)
|
|
2091
|
+
- `v2/durable-core/schemas/**` → exports Zod schemas + inferred TS types for all durable artifacts
|
|
2092
|
+
- `v2/durable-core/projections/**` → pure projection functions over typed events/snapshots
|
|
2093
|
+
|
|
2094
|
+
Ports (pure interfaces only):
|
|
2095
|
+
- `v2/ports/**` → TypeScript interfaces; no Node imports, no implementation
|
|
2096
|
+
|
|
2097
|
+
Infra (side effects only):
|
|
2098
|
+
- `v2/infra/**` → file I/O, locks, keyring; imports from `durable-core` and `ports`; provides port implementations
|
|
2099
|
+
|
|
2100
|
+
Lock:
|
|
2101
|
+
- `v2/durable-core/**` MUST NOT import from `v2/infra/**` or any Node I/O modules (fs, crypto, path).
|
|
2102
|
+
- `v2/ports/**` MUST NOT import from `v2/infra/**`.
|
|
2103
|
+
- Side effects are injected via ports; pure core consumes port interfaces only.
|
|
2104
|
+
|
|
2105
|
+
### Port interfaces (Phase 0–2, locked)
|
|
2106
|
+
These are the exact TypeScript interfaces required for Phase 0–2 (initial v2 schema). All return `Result` or `ResultAsync` from `neverthrow`.
|
|
2107
|
+
|
|
2108
|
+
**Phase 0 (minimal ports, for hashing/canonicalization)**
|
|
2109
|
+
- **`CryptoPort`**:
|
|
2110
|
+
- `sha256(bytes: Uint8Array): Sha256Digest`
|
|
2111
|
+
|
|
2112
|
+
- **`DataDirPort`**:
|
|
2113
|
+
- `getRoot(): AbsolutePath`
|
|
2114
|
+
- `sessionRoot(sessionId: SessionId): AbsolutePath`
|
|
2115
|
+
- `snapshotPath(snapshotRef: SnapshotRef): AbsolutePath`
|
|
2116
|
+
- `pinnedWorkflowPath(workflowHash: WorkflowHash): AbsolutePath`
|
|
2117
|
+
|
|
2118
|
+
**Phase 2 (storage + locking)**
|
|
2119
|
+
- **`FileLockPort`**:
|
|
2120
|
+
- `withSessionLock<T>(sessionId: SessionId, fn: () => ResultAsync<T, E>): ResultAsync<T, SessionLockedError | E>`
|
|
2121
|
+
|
|
2122
|
+
- **`SessionStorePort`**:
|
|
2123
|
+
- `append(sessionId: SessionId, plan: AppendPlan): ResultAsync<AppendResult, SessionStoreError>`
|
|
2124
|
+
- `load(sessionId: SessionId): ResultAsync<LoadedSession, SessionStoreError>`
|
|
2125
|
+
- `loadValidatedPrefix(sessionId: SessionId): ResultAsync<LoadedSessionPrefix, SessionStoreError>` (salvage)
|
|
2126
|
+
|
|
2127
|
+
- **`SnapshotStorePort`** (CAS):
|
|
2128
|
+
- `put(snapshot: ExecutionSnapshotFile): ResultAsync<SnapshotRef, SnapshotStoreError>`
|
|
2129
|
+
- `get(snapshotRef: SnapshotRef): ResultAsync<ExecutionSnapshotFile, SnapshotStoreError>`
|
|
2130
|
+
|
|
2131
|
+
- **`PinnedWorkflowStorePort`**:
|
|
2132
|
+
- `put(workflowHash: WorkflowHash, compiled: CompiledWorkflowSnapshotV1): ResultAsync<void, PinnedWorkflowError>`
|
|
2133
|
+
- `get(workflowHash: WorkflowHash): ResultAsync<CompiledWorkflowSnapshotV1, PinnedWorkflowError>`
|
|
2134
|
+
|
|
2135
|
+
- **`ProjectionCachePort`** (derived):
|
|
2136
|
+
- `get(sessionId: SessionId): ResultAsync<ProjectionCache | null, CacheError>`
|
|
2137
|
+
- `put(sessionId: SessionId, cache: ProjectionCache): ResultAsync<void, CacheError>`
|
|
2138
|
+
- `invalidate(sessionId: SessionId): ResultAsync<void, CacheError>`
|
|
2139
|
+
|
|
2140
|
+
Locks:
|
|
2141
|
+
- All methods return `Result` or `ResultAsync` with typed errors (no throws).
|
|
2142
|
+
- Branded types for IDs, hashes, and all payloads (no string/number soup).
|
|
2143
|
+
- `AppendPlan`, `LoadedSession`, etc. are defined in `v2/durable-core` and imported by ports (keep ports pure interfaces).
|
|
2144
|
+
|
|
2145
|
+
|
|
2146
|
+
---
|
|
2147
|
+
|
|
2148
|
+
## 18. Open Items (Slice 4a)
|
|
2149
|
+
|
|
2150
|
+
### 18.1 notesMarkdown Accumulation Semantics
|
|
2151
|
+
|
|
2152
|
+
**Issue**: Current tool description and contract docs don't explicitly state whether `output.notesMarkdown` in `continue_workflow` is:
|
|
2153
|
+
- **Per-step fresh** (agent provides summary of THIS step only)
|
|
2154
|
+
- **Cumulative** (agent appends to or includes previous notes)
|
|
2155
|
+
|
|
2156
|
+
**Impact**: Ambiguity will cause agents to produce exponentially growing cumulative notes, making deterministic truncation impossible and violating byte budgets.
|
|
2157
|
+
|
|
2158
|
+
**Current implementation** (Slice 3):
|
|
2159
|
+
- Notes are **stored** in event log as `node_output_appended` (channel=`recap`)
|
|
2160
|
+
- Notes are **not yet returned** in `continue_workflow` responses
|
|
2161
|
+
- No agent guidance on accumulation behavior
|
|
2162
|
+
|
|
2163
|
+
**Proposed lock** (for Slice 4a):
|
|
2164
|
+
|
|
2165
|
+
`notesMarkdown` is **per-step, not cumulative**:
|
|
2166
|
+
- Each `continue_workflow` call receives a **fresh summary** of work accomplished in that specific step
|
|
2167
|
+
- Agent MUST NOT append to or reference previous step notes in `notesMarkdown`
|
|
2168
|
+
- WorkRail aggregates notes across steps via the recap projection with deterministic budgeting
|
|
2169
|
+
- When recap is returned (Slice 4a/4b), it's WorkRail's responsibility to provide bounded context from previous steps
|
|
2170
|
+
|
|
2171
|
+
**Rationale**:
|
|
2172
|
+
1. **Deterministic truncation**: Per-step notes have predictable size; cumulative notes depend on entire chat history
|
|
2173
|
+
2. **Byte budget enforcement**: Fresh notes can be validated against max bytes (4096); cumulative notes violate budgets by construction
|
|
2174
|
+
3. **Rewind safety**: Each step's notes are independent; cumulative notes require reading all previous notes (breaks rehydrate purity)
|
|
2175
|
+
4. **Composability**: Projections can aggregate/filter/budget notes deterministically
|
|
2176
|
+
|
|
2177
|
+
**Where to document** (when locked):
|
|
2178
|
+
1. Update `src/mcp/v2/tools.ts` schema description for `notesMarkdown`
|
|
2179
|
+
2. Add normative section to `docs/reference/workflow-execution-contract.md`
|
|
2180
|
+
3. Update `docs/plans/workflow-v2-design.md`
|
|
2181
|
+
|
|
2182
|
+
**Decision needed before Slice 4a ships.**
|
|
2183
|
+
|
|
2184
|
+
**DECISION (Slice 4a continuation)**: LOCKED as **per-step fresh**.
|
|
2185
|
+
- Tool schema updated: `src/mcp/v2/tools.ts` (describes per-step semantics)
|
|
2186
|
+
- Contract updated: `docs/reference/workflow-execution-contract.md` (normative section added)
|
|
2187
|
+
- Implementation: Slice 4a S6 (schema + docs)
|
|
2188
|
+
|
|
2189
|
+
**Quality expectations** (added post-lock): Notes are displayed to the user in a markdown viewer in the Console UI and serve as the durable record of agent work. Agent-facing instructions (tool schema, tool descriptions, system-injected prompt) guide agents to write 10–30 line recaps including: what they did and key decisions, what they produced (files, paths, numbers), and anything notable (risks, open questions, deliberate omissions). Markdown formatting is expected.
|
|
2190
|
+
|
|
2191
|
+
### 18.2 Context Persistence and Auto-Loading
|
|
2192
|
+
|
|
2193
|
+
**Issue**: Current design lock Section 16.3.1 states "context is not persisted as durable truth," but Slice 3 implementation reveals this creates fragility:
|
|
2194
|
+
- Agents must manually re-pass context on every `continue_workflow` call
|
|
2195
|
+
- Missing context keys cause cryptic `advance_next_failed` errors deep in the interpreter
|
|
2196
|
+
- Context is lost on rewinds, preventing effective resume
|
|
2197
|
+
- Long workflows (20+ steps) compound the cognitive load and failure probability
|
|
2198
|
+
- No way to recover context after agent restart or handoff
|
|
2199
|
+
|
|
2200
|
+
**Impact**: The stateless context design works for simple workflows but breaks down for:
|
|
2201
|
+
- Complex workflows with external dependencies (forEach over `context.slices`, feature flags, environment config)
|
|
2202
|
+
- Long-running workflows (>10 steps) where context re-passing becomes error-prone
|
|
2203
|
+
- Rewind scenarios where agent loses local context
|
|
2204
|
+
- Multi-agent handoffs or resume-after-delay scenarios
|
|
2205
|
+
|
|
2206
|
+
**Current state** (Slice 3):
|
|
2207
|
+
- Context is validated at MCP boundary (256KB limit, JSON-only, RFC 8785 JCS)
|
|
2208
|
+
- Context is passed to interpreter for condition evaluation
|
|
2209
|
+
- Context is **not stored** in events or snapshots
|
|
2210
|
+
- Agent must re-pass on every call or face errors
|
|
2211
|
+
|
|
2212
|
+
**Design tension identified:**
|
|
2213
|
+
The lock assumes context is **simple external inputs** (ticket IDs, paths), but real workflows use context for:
|
|
2214
|
+
- Loop iteration data (`context.slices` for forEach)
|
|
2215
|
+
- Conditional gating (`context.featureFlags` for if/else)
|
|
2216
|
+
- Workflow parameters that don't change but must be available throughout execution
|
|
2217
|
+
|
|
2218
|
+
**Three approaches validated via subagent analysis:**
|
|
2219
|
+
|
|
2220
|
+
#### Approach A: context_set Event (Append-Only)
|
|
2221
|
+
Store context as immutable events in the event log:
|
|
2222
|
+
```typescript
|
|
2223
|
+
{
|
|
2224
|
+
kind: "context_set",
|
|
2225
|
+
scope: { runId },
|
|
2226
|
+
data: {
|
|
2227
|
+
contextId: string,
|
|
2228
|
+
context: Record<string, unknown>,
|
|
2229
|
+
source: "initial" | "agent_delta" | "merge"
|
|
2230
|
+
}
|
|
2231
|
+
}
|
|
2232
|
+
```
|
|
2233
|
+
|
|
2234
|
+
**Pros:**
|
|
2235
|
+
- Append-only compatible (events are immutable)
|
|
2236
|
+
- Single source of truth (everything in event log)
|
|
2237
|
+
- Auditable (full context history visible)
|
|
2238
|
+
- Proven pattern (mirrors `preferences_changed`)
|
|
2239
|
+
- Supports context evolution (new context_set events as needed)
|
|
2240
|
+
|
|
2241
|
+
**Cons:**
|
|
2242
|
+
- Violates Section 16.3.1 "no durability" lock (requires lock revision)
|
|
2243
|
+
- O(n) query without metadata cache (mitigated by metadata/context.jsonl pattern)
|
|
2244
|
+
- 18 existing tests break (assume stateless)
|
|
2245
|
+
- 17-22 new tests needed for edge cases
|
|
2246
|
+
|
|
2247
|
+
**Implementation:** ~11 hours + ~25 hours testing + security hardening
|
|
2248
|
+
|
|
2249
|
+
#### Approach B: Context in Token Payload
|
|
2250
|
+
Embed context directly in `stateToken` payload.
|
|
2251
|
+
|
|
2252
|
+
**Verdict:** ❌ REJECTED
|
|
2253
|
+
- Token size explosion (256KB context → 347KB token)
|
|
2254
|
+
- Breaks idempotency (signature depends on context)
|
|
2255
|
+
- Violates 4-5 token design locks
|
|
2256
|
+
|
|
2257
|
+
#### Approach C: Separate CAS Context Store
|
|
2258
|
+
Store context in parallel CAS store alongside snapshots.
|
|
2259
|
+
|
|
2260
|
+
**Verdict:** ⚠️ VIABLE but complex
|
|
2261
|
+
- Not in event log (dual sources of truth)
|
|
2262
|
+
- Requires new infrastructure
|
|
2263
|
+
- Garbage collection needed
|
|
2264
|
+
- Higher implementation cost
|
|
2265
|
+
|
|
2266
|
+
**Subagent validation results:**
|
|
2267
|
+
- **Architect**: ❌ Rejects (violates locks, idempotency concerns)
|
|
2268
|
+
- **Developer**: ✅ Feasible (6-8 days, 195 LOC)
|
|
2269
|
+
- **QA**: ⚠️ 18 tests break, needs design clarifications
|
|
2270
|
+
- **Security**: 🔴 5 critical issues (encryption, sanitization, attribution)
|
|
2271
|
+
- **Performance**: ❌ Unacceptable without metadata cache (15s overhead at 1000 steps)
|
|
2272
|
+
|
|
2273
|
+
**Proposed lock revision** (for Slice 4a):
|
|
2274
|
+
|
|
2275
|
+
**Revise Section 16.3.1 to allow durable context storage:**
|
|
2276
|
+
|
|
2277
|
+
Context durability rules (revised):
|
|
2278
|
+
- **Initial context durability**: `context` provided to `start_workflow` IS persisted as a `context_set` event for the run
|
|
2279
|
+
- **Context evolution**: Additional `context_set` events record context deltas/changes throughout execution
|
|
2280
|
+
- **Auto-loading**: `continue_workflow` automatically loads the latest context from `context_set` events (agent does not re-pass)
|
|
2281
|
+
- **Optional delta**: Agent may provide `context` parameter to merge with stored context (delta overrides stored fields)
|
|
2282
|
+
- **No echo**: Responses still MUST NOT echo full context back (only contextId or summary if needed)
|
|
2283
|
+
- **Byte budget**: Still enforced (256KB max) at `start_workflow` and per `context_set` event
|
|
2284
|
+
- **Projection**: Latest `context_set` event for a runId defines current context
|
|
2285
|
+
- **Performance**: Use metadata cache pattern (`metadata/context.jsonl`) for O(1) lookup
|
|
2286
|
+
|
|
2287
|
+
**Rationale for revision:**
|
|
2288
|
+
1. **Empirical evidence**: Slice 3 manual testing revealed stateless context is fragile for real workflows
|
|
2289
|
+
2. **Alignment with v2 philosophy**: Context artifacts are immutable (append-only compatible)
|
|
2290
|
+
3. **Proven pattern**: Mirrors `preferences_changed` and `node_output_appended` event designs
|
|
2291
|
+
4. **User experience**: Agents should execute steps, not manage distributed state
|
|
2292
|
+
5. **Rewind safety**: Durable context enables true resume after rewinds
|
|
2293
|
+
|
|
2294
|
+
**Implementation requirements:**
|
|
2295
|
+
1. Add `context_set` to DomainEventV1 discriminated union
|
|
2296
|
+
2. Create context projection (`projectRunContext`)
|
|
2297
|
+
3. Emit `context_set` in `start_workflow` handler
|
|
2298
|
+
4. Auto-load context in `continue_workflow` handler
|
|
2299
|
+
5. Support optional delta merging
|
|
2300
|
+
6. Add metadata cache for performance (O(1) lookup)
|
|
2301
|
+
7. Security: sanitize `__proto__`, `constructor`, `prototype` fields
|
|
2302
|
+
8. Testing: 17-22 new test cases for context evolution, replay, forks
|
|
2303
|
+
|
|
2304
|
+
**DECISION (Slice 4a continuation)**: LOCKED as **Approach A (context_set event, run-scoped)**.
|
|
2305
|
+
- Event schema: `context_set` added to `DomainEventV1Schema` with `scope: { runId }` (run-scoped, not node-scoped)
|
|
2306
|
+
- Merge semantics: shallow merge; `null` values delete keys (tombstones); `undefined` ignored; arrays/objects replaced; reserved keys (`__proto__`, `constructor`, `prototype`) rejected
|
|
2307
|
+
- Implementation: Slice 4a S8 (projection + merge + auto-load)
|
|
2308
|
+
|
|
2309
|
+
**Security considerations:**
|
|
2310
|
+
- Context may contain sensitive data (API keys, tokens, credentials)
|
|
2311
|
+
- Encryption at rest recommended (AES-256-GCM with OS keychain)
|
|
2312
|
+
- Agent attribution in event metadata for audit
|
|
2313
|
+
- Tamper detection via manifest signing
|
|
2314
|
+
|
|
2315
|
+
**Performance considerations:**
|
|
2316
|
+
- Use metadata cache pattern to avoid O(n) event log scans
|
|
2317
|
+
- Tail-read `metadata/context.jsonl` for latest context (O(1))
|
|
2318
|
+
- Auto-rebuild on cache miss or corruption
|
|
2319
|
+
|
|
2320
|
+
**Migration path:**
|
|
2321
|
+
- Feature-flagged rollout (`WORKRAIL_ENABLE_CONTEXT_PERSISTENCE`)
|
|
2322
|
+
- Backward compatible (old sessions without context_set still work)
|
|
2323
|
+
- Gradual adoption per workflow
|
|
2324
|
+
|
|
2325
|
+
**Decision authority:** Requires design review and lock revision approval before Slice 4a implementation.
|
|
2326
|
+
|
|
2327
|
+
**Open sub-questions for Slice 4a:**
|
|
2328
|
+
1. Should context be per-run (one context for entire run) or per-node (context can change per step)?
|
|
2329
|
+
2. Merge semantics: shallow spread vs deep merge vs explicit delete markers?
|
|
2330
|
+
3. Context in rehydrate-only path: return contextId hint or omit entirely?
|
|
2331
|
+
4. Context size accounting: count against node output budget or separate limit?
|
|
2332
|
+
|
|
2333
|
+
**Decision needed before Slice 4a ships.**
|
|
2334
|
+
|
|
2335
|
+
---
|
|
2336
|
+
|
|
2337
|
+
### 18.3 Agent Execution Guidance (locked - implemented)
|
|
2338
|
+
|
|
2339
|
+
**Status**: ✅ Implemented across Layers 1-3 + outputContract guidance (PR #61)
|
|
2340
|
+
|
|
2341
|
+
**Layer 1**: Tool descriptions rewritten (v1 behavioral clarity, nextIntent explanations)
|
|
2342
|
+
**Layer 2**: Field descriptions enhanced (lifecycle clarity, WRONG/RIGHT examples)
|
|
2343
|
+
**Layer 3**: Prompt-based requirement injection — two paths:
|
|
2344
|
+
- **validationCriteria path** (legacy): Extracts requirements from contains/regex/length rules
|
|
2345
|
+
- **outputContract path** (preferred): System-injected contract guidance from contract registry (e.g., loop control → required fields + enum values). Guidance is system-generated, not from authored prompts.
|
|
2346
|
+
|
|
2347
|
+
**Historical context**: Initial proposal was `agentInstructions` field in responses. After validation testing and subagent review, pivoted to prompt-based injection as simpler and more philosophically aligned. Extended in PR #61 to support `outputContract` with typed artifact guidance.
|
|
2348
|
+
|
|
2349
|
+
---
|
|
2350
|
+
|
|
2351
|
+
**Problem Solved**: Agents had 40% error rate with v2 tools due to:
|
|
2352
|
+
1. Poor tool descriptions (minimal, technical-only) → Fixed by Layer 1
|
|
2353
|
+
2. Unclear field semantics (stateToken vs ackToken confusion) → Fixed by Layer 2
|
|
2354
|
+
3. Invisible workflow requirements (validationCriteria not visible) → Fixed by Layer 3
|
|
2355
|
+
|
|
2356
|
+
**Validation Results**:
|
|
2357
|
+
- Post-L1+L2: 33% error rate (eliminated prediction, context re-passing, notes accumulation)
|
|
2358
|
+
- Remaining errors: Workflow contract opacity (agents don't see validation requirements)
|
|
2359
|
+
|
|
2360
|
+
---
|
|
2361
|
+
|
|
2362
|
+
**Layer 3 Design (Prompt-Based Requirement Injection)**:
|
|
2363
|
+
|
|
2364
|
+
**Approach**: Enhance `renderPendingPrompt()` to extract and append validation requirements to prompt text.
|
|
2365
|
+
|
|
2366
|
+
**Implementation**:
|
|
2367
|
+
1. Create `extractValidationRequirements(validationCriteria)` pure function
|
|
2368
|
+
2. Modify `renderPendingPrompt()` to append requirements section when validationCriteria exists
|
|
2369
|
+
3. Format: "\n\nOUTPUT REQUIREMENTS:\n- Must contain 'X'\n- Must match pattern Y\n- Length: Z"
|
|
2370
|
+
|
|
2371
|
+
**Rationale over _guide field alternative**:
|
|
2372
|
+
- ✅ Proactive (agents see BEFORE working, not after failing)
|
|
2373
|
+
- ✅ Simpler (50 lines vs 900 lines for _guide extraction + population)
|
|
2374
|
+
- ✅ Philosophy-aligned (fail fast = prevent failures, not guide after failures)
|
|
2375
|
+
- ✅ Architectural fix (makes invisible visible at point of use, not compensation layer)
|
|
2376
|
+
- ✅ Lower maintenance burden (one injection point vs three population scenarios)
|
|
2377
|
+
|
|
2378
|
+
**Validation types supported** (covers 100% of empirical usage):
|
|
2379
|
+
- `contains` rules (51% of all rules)
|
|
2380
|
+
- `regex` rules (36% of all rules)
|
|
2381
|
+
- `length` rules (13% of all rules)
|
|
2382
|
+
- `and` compositions (most common)
|
|
2383
|
+
|
|
2384
|
+
**Payload impact**: Minimal - appended to prompt (no new field), capped at top 5 requirements.
|
|
2385
|
+
|
|
2386
|
+
**Files**:
|
|
2387
|
+
- `src/v2/durable-core/domain/validation-requirements-extractor.ts` (new, pure function)
|
|
2388
|
+
- `src/v2/durable-core/domain/prompt-renderer.ts` (modified to call extractor)
|
|
2389
|
+
|
|
2390
|
+
**Related**: §19 Evidence-Based Validation Design (prevents gameable validation when requirements visible)
|
|
2391
|
+
|
|
2392
|
+
**Lock Decision** (implemented):
|
|
2393
|
+
|
|
2394
|
+
Validation requirements are injected into `pending.prompt` text via `renderPendingPrompt()` enhancement. Two injection paths exist:
|
|
2395
|
+
|
|
2396
|
+
**Path 1 — validationCriteria (legacy)**: Extracts requirements from `validationCriteria` (contains/regex/length rules) and appends as bulleted OUTPUT REQUIREMENTS section.
|
|
2397
|
+
|
|
2398
|
+
**Path 2 — outputContract (preferred, added in PR #61)**: System-injected contract guidance from `outputContract.contractRef`. Uses a contract registry to render canonical instructions (e.g., loop control contract → required fields + enum values). This guidance is generated by the system, not from the authored prompt.
|
|
2399
|
+
|
|
2400
|
+
**Example output (validationCriteria path)**:
|
|
2401
|
+
```typescript
|
|
2402
|
+
{
|
|
2403
|
+
stateToken: "st1...",
|
|
2404
|
+
ackToken: "ack1...",
|
|
2405
|
+
pending: {
|
|
2406
|
+
stepId: "phase-1",
|
|
2407
|
+
title: "Planning",
|
|
2408
|
+
prompt: "Create an implementation plan...\n\n---\nOUTPUT REQUIREMENTS:\n- Must contain 'planningComplete = true'\n- Must match pattern: ^[a-z]+$\n- Length: ≥100 chars, ≤500 chars"
|
|
2409
|
+
},
|
|
2410
|
+
nextIntent: "perform_pending_then_continue",
|
|
2411
|
+
preferences: {...}
|
|
2412
|
+
}
|
|
2413
|
+
```
|
|
2414
|
+
|
|
2415
|
+
**Example output (outputContract path)**:
|
|
2416
|
+
```typescript
|
|
2417
|
+
{
|
|
2418
|
+
pending: {
|
|
2419
|
+
prompt: "Provide a loop control artifact...\n\n---\n**OUTPUT REQUIREMENTS (System):**\n- Artifact contract: wr.contracts.loop_control\n- Provide artifact with fields:\n - kind: \"wr.loop_control\"\n - loopId: <lowercase id>\n - decision: \"continue\" | \"stop\""
|
|
2420
|
+
}
|
|
2421
|
+
}
|
|
2422
|
+
```
|
|
2423
|
+
|
|
2424
|
+
**How it works**:
|
|
2425
|
+
1. `renderPendingPrompt()` calls `getStepById()` to access step definition
|
|
2426
|
+
2. If step has `outputContract`, calls `formatOutputContractRequirements(contract)` — system-generated, contract-driven guidance
|
|
2427
|
+
3. Else if step has `validationCriteria`, calls `extractValidationRequirements(criteria)` — requirement extraction from rules
|
|
2428
|
+
4. Formats requirements as bulleted list and appends to prompt text
|
|
2429
|
+
5. Caps at top 5 requirements to prevent prompt bloat
|
|
2430
|
+
|
|
2431
|
+
**Coverage**: validationCriteria handles contains (51%), regex (36%), length (13%) rules + and compositions = 100% of empirical validation usage. outputContract handles typed artifact contracts (loop control, extensible to future contracts).
|
|
2432
|
+
|
|
2433
|
+
**Priority**: `outputContract` takes precedence over `validationCriteria` when both are present on a step. This is the preferred path going forward.
|
|
2434
|
+
|
|
2435
|
+
**Historical note**: The `agentInstructions` response field approach was considered and rejected in favor of prompt-based injection (50 lines vs 900 lines, more philosophically aligned). See PR #57 discussion for details.
|
|
2436
|
+
- Hybrid (structured with markdown values)?
|
|
2437
|
+
|
|
2438
|
+
4. **Payload budget**: What's acceptable overhead?
|
|
2439
|
+
- Full guidance (~500 bytes) acceptable at start?
|
|
2440
|
+
- Condensed version (~200 bytes) for rehydrate?
|
|
2441
|
+
- Zero bytes for normal advancement?
|
|
2442
|
+
|
|
2443
|
+
5. **Evolution strategy**: How to update guidance over time?
|
|
2444
|
+
- Version the guidance text?
|
|
2445
|
+
- Make it workflow-hash-specific (part of pinned snapshot)?
|
|
2446
|
+
- Keep it WorkRail-version-specific (independent of workflow)?
|
|
2447
|
+
|
|
2448
|
+
**Empirical Input from Slice 4a Manual Testing**:
|
|
2449
|
+
- Agents **did** successfully execute workflows without external docs (when given test instructions)
|
|
2450
|
+
- Agents **did** get confused about context persistence (multiple `advance_next_failed` errors during agentic workflow execution)
|
|
2451
|
+
- Agents **did** correctly interpret `nextIntent` values after seeing them in responses (inferred meaning from pattern)
|
|
2452
|
+
- Agents **did not** know how to retry after `blocked` without explicit test instructions
|
|
2453
|
+
|
|
2454
|
+
**Design Decision Points**:
|
|
2455
|
+
1. Is this **normative** (required for contract compliance) or **recommended** (optional UX improvement)?
|
|
2456
|
+
2. Should it be **static system guidance** or **workflow-specific instructions**?
|
|
2457
|
+
3. Should it replace/augment the existing `pending.prompt` pattern or be purely additive?
|
|
2458
|
+
|
|
2459
|
+
**Decision authority**: Requires design review before Slice 4b implementation.
|
|
2460
|
+
|
|
2461
|
+
**Status**: Open item (document design, then implement in 4b or dedicated follow-up MR)
|
|
2462
|
+
|
|
2463
|
+
---
|
|
2464
|
+
|
|
2465
|
+
### 18.4 Idempotency vs Validation Bug Fixes (Agent UX Consideration)
|
|
2466
|
+
|
|
2467
|
+
**Issue**: When validation logic has a bug and gets fixed, agents cannot see the fix without rehydrating to get a fresh ackToken. The strict idempotency rule (replay returns cached outcome, never recomputes) means:
|
|
2468
|
+
- Agent calls `continue_workflow(stateToken, ackToken, output)` with bad output
|
|
2469
|
+
- Validation fails → blocked outcome recorded
|
|
2470
|
+
- Validation bug fixed in deployment
|
|
2471
|
+
- Agent retries with same ackToken → replays old blocked outcome (doesn't see fix)
|
|
2472
|
+
- Agent must rehydrate (call without ackToken) to get fresh token with new attemptId
|
|
2473
|
+
|
|
2474
|
+
**Impact**: Agent confusion and friction. Error message says "retry with same ackToken" but this doesn't work when validation logic changed. Agents get stuck in retry loops or seek user help.
|
|
2475
|
+
|
|
2476
|
+
**Current Behavior** (by design):
|
|
2477
|
+
- Idempotency keyed by `attemptId` embedded in ackToken
|
|
2478
|
+
- `dedupeKey = "advance_recorded:<sessionId>:<nodeId>:<attemptId>"`
|
|
2479
|
+
- Same attemptId → replay cached outcome (fact-returning, no recompute)
|
|
2480
|
+
- Normative requirement: "MUST NOT re-run validation" (line 174, workflow-execution-contract.md)
|
|
2481
|
+
|
|
2482
|
+
**Design Philosophy** (why strict idempotency):
|
|
2483
|
+
1. **Determinism**: Same token → same outcome across deployments/versions
|
|
2484
|
+
2. **Audit trail**: Historical events show what ACTUALLY happened at that time
|
|
2485
|
+
3. **Rewind safety**: Old tokens replay old behavior (temporal consistency)
|
|
2486
|
+
4. **Export/import**: Sessions portable across versions
|
|
2487
|
+
|
|
2488
|
+
**Agent UX Friction**:
|
|
2489
|
+
- Validation is a **pure function** of pinned inputs (workflow hash-locked, context snapshot-able)
|
|
2490
|
+
- Validation bugs getting fixed is DESIRABLE (want corrections to apply)
|
|
2491
|
+
- Current design treats validation checkpoints as immutable outcomes
|
|
2492
|
+
- Agent must learn: "fix output, rehydrate for fresh token, retry" (3 steps instead of 1)
|
|
2493
|
+
|
|
2494
|
+
**Potential Future Consideration** (NOT changing current lock):
|
|
2495
|
+
|
|
2496
|
+
Three architectural alternatives if agent friction becomes significant:
|
|
2497
|
+
|
|
2498
|
+
**Option A: Keep Strict Idempotency** (current, LOCKED)
|
|
2499
|
+
- Fix: Improve error messages to teach rehydrate pattern
|
|
2500
|
+
- Fix: Layer 3 guidance (`_guide` field) provides recovery steps
|
|
2501
|
+
- Trade-off: Agent education burden for system correctness
|
|
2502
|
+
- Status: ✅ Implemented via blocker message improvements
|
|
2503
|
+
|
|
2504
|
+
**Option B: Exempt Validation from Idempotency**
|
|
2505
|
+
- Change: Re-run validation on replay if outcome was `blocked` with `INVALID_REQUIRED_OUTPUT`
|
|
2506
|
+
- Rationale: Validation is pure, bug fixes should apply
|
|
2507
|
+
- Trade-off: Outcomes change across deployments (violates audit immutability)
|
|
2508
|
+
- Status: ❌ Would require reopening locked design
|
|
2509
|
+
|
|
2510
|
+
**Option C: Version-Aware Recompute**
|
|
2511
|
+
- Record ValidationEngine version with blocked outcome
|
|
2512
|
+
- Replay if same version, recompute if version changed
|
|
2513
|
+
- Trade-off: Complex, requires version tracking
|
|
2514
|
+
- Status: ❌ Over-engineered for current problem
|
|
2515
|
+
|
|
2516
|
+
**Option D: Make Blocked Non-Final**
|
|
2517
|
+
- Don't record blocked as immutable outcome
|
|
2518
|
+
- Allow re-validation with same ackToken
|
|
2519
|
+
- Record only successful advances
|
|
2520
|
+
- Trade-off: Blocked isn't auditable history
|
|
2521
|
+
- Status: ❌ Violates "all outcomes recorded" principle
|
|
2522
|
+
|
|
2523
|
+
**Current Decision** (2026-01-09):
|
|
2524
|
+
- **KEEP strict idempotency** (Option A)
|
|
2525
|
+
- **Improve agent education** via error messages and Layer 3 guidance
|
|
2526
|
+
- **Monitor agent friction** in production usage
|
|
2527
|
+
- **Revisit only if** agent error rate remains >10% after Layer 1+2+3 improvements
|
|
2528
|
+
|
|
2529
|
+
**Rationale**:
|
|
2530
|
+
- Design locks exist for good reasons (determinism, audit, rewind-safety)
|
|
2531
|
+
- Agent UX issue is solvable via better messaging (non-breaking)
|
|
2532
|
+
- Validation bug fixes are rare compared to validation failures from bad output
|
|
2533
|
+
- The 3-step recovery pattern (fix, rehydrate, retry) is learnable
|
|
2534
|
+
|
|
2535
|
+
**Future trigger for revisiting**:
|
|
2536
|
+
- If post-Layer 3 agent error rate >10% AND >50% of errors are "confused about idempotency"
|
|
2537
|
+
- If validation bugs in production are frequent (>1 per month affecting workflows)
|
|
2538
|
+
- If user feedback indicates rehydrate pattern is a major pain point
|
|
2539
|
+
|
|
2540
|
+
**Decision authority**: Would require design review and consensus to change (currently locked).
|
|
2541
|
+
|
|
2542
|
+
**Status**: Documented consideration; current lock maintained (strict idempotency for all outcomes including validation failures).
|
|
2543
|
+
|
|
2544
|
+
---
|
|
2545
|
+
|
|
2546
|
+
## 19. Agent Delegation Instructions (Workflow-Driven Subagent Execution)
|
|
2547
|
+
|
|
2548
|
+
**Issue**: Workflows request subagent delegation (e.g., "Spawn 3 WorkRail Executors SIMULTANEOUSLY using routine-ideation") but agents frequently misinterpret these instructions, leading to:
|
|
2549
|
+
- Agents starting new workflow runs instead of spawning subagents via the Task tool
|
|
2550
|
+
- Sequential execution when parallel delegation is requested
|
|
2551
|
+
- Incorrect subagent type selection (using general-purpose instead of workrail-executor)
|
|
2552
|
+
- Missing context propagation to subagents
|
|
2553
|
+
|
|
2554
|
+
**Impact**: Workflow execution fails or executes incorrectly; parallel delegation benefits are lost; THOROUGH rigor mode cannot achieve intended parallelism.
|
|
2555
|
+
|
|
2556
|
+
**Root Cause**: Workflow prompt instructions are not explicit enough about the distinction between:
|
|
2557
|
+
1. **Starting a new workflow run** (via `mcp_workrail_start_workflow`) - creates a new session/run with its own state
|
|
2558
|
+
2. **Spawning a subagent** (via Task tool with `subagent_type: workrail-executor`) - delegates work within the current session
|
|
2559
|
+
|
|
2560
|
+
**Proposed Lock** (for workflow authoring guidance):
|
|
2561
|
+
|
|
2562
|
+
When workflows request subagent delegation, prompts MUST use this explicit template:
|
|
2563
|
+
|
|
2564
|
+
```
|
|
2565
|
+
**If subagents + rigorMode=THOROUGH:**
|
|
2566
|
+
|
|
2567
|
+
SPAWN SUBAGENTS (not workflows) - use the Task tool with subagent_type: workrail-executor
|
|
2568
|
+
|
|
2569
|
+
Spawn {N} subagents IN PARALLEL by making {N} Task tool calls in a single response:
|
|
2570
|
+
|
|
2571
|
+
**Subagent 1 — [Name]:**
|
|
2572
|
+
- Use: Task tool with subagent_type: workrail-executor
|
|
2573
|
+
- Prompt: "Execute {routine-name} with: {params}"
|
|
2574
|
+
- Description: "{short description}"
|
|
2575
|
+
|
|
2576
|
+
**Subagent 2 — [Name]:**
|
|
2577
|
+
- Use: Task tool with subagent_type: workrail-executor
|
|
2578
|
+
- Prompt: "Execute {routine-name} with: {params}"
|
|
2579
|
+
- Description: "{short description}"
|
|
2580
|
+
|
|
2581
|
+
DO NOT use mcp_workrail_start_workflow for delegation.
|
|
2582
|
+
DO NOT call Task tools sequentially - make all {N} calls in one response block.
|
|
2583
|
+
```
|
|
2584
|
+
|
|
2585
|
+
**Rationale**:
|
|
2586
|
+
1. **Explicit tool naming**: "Use Task tool with subagent_type: workrail-executor" removes ambiguity
|
|
2587
|
+
2. **Negative constraint**: "DO NOT use mcp_workrail_start_workflow" prevents wrong tool selection
|
|
2588
|
+
3. **Parallelism guidance**: "IN PARALLEL... in one response block" ensures parallel execution
|
|
2589
|
+
4. **Template consistency**: Standardized format across all workflows reduces learning overhead
|
|
2590
|
+
|
|
2591
|
+
**Implementation Requirement**:
|
|
2592
|
+
- Update all workflows that use delegation (coding-task-workflow-agentic, mr-review-workflow-agentic, bug-investigation-agentic, etc.) to use this explicit template
|
|
2593
|
+
- Add this guidance to workflow authoring documentation
|
|
2594
|
+
|
|
2595
|
+
**Verification**:
|
|
2596
|
+
- Test that agents correctly spawn parallel subagents when instructed
|
|
2597
|
+
- Audit existing workflows for ambiguous delegation instructions
|
|
2598
|
+
|
|
2599
|
+
**Decision needed**: Approve this template and plan workflow updates, or propose alternative phrasing.
|
|
2600
|
+
|
|
2601
|
+
---
|
|
2602
|
+
|
|
2603
|
+
## 19) Workflow Authoring: Evidence-Based Validation Design (locked)
|
|
2604
|
+
|
|
2605
|
+
**Lock**: Workflow validation criteria should validate **evidence of work completed**, not just **completion flags**.
|
|
2606
|
+
|
|
2607
|
+
### The Problem (Empirically Observed)
|
|
2608
|
+
|
|
2609
|
+
**Gameable validation** (anti-pattern):
|
|
2610
|
+
```json
|
|
2611
|
+
{
|
|
2612
|
+
"validationCriteria": {
|
|
2613
|
+
"type": "contains",
|
|
2614
|
+
"value": "analysisComplete = true"
|
|
2615
|
+
}
|
|
2616
|
+
}
|
|
2617
|
+
```
|
|
2618
|
+
|
|
2619
|
+
**Agent checkbox behavior**:
|
|
2620
|
+
```
|
|
2621
|
+
Output: "analysisComplete = true"
|
|
2622
|
+
Result: ✅ Passes validation
|
|
2623
|
+
Quality: ❌ No actual analysis done
|
|
2624
|
+
```
|
|
2625
|
+
|
|
2626
|
+
When validation requirements become visible to agents (via prompt injection, _guide fields, or enhanced error messages), agents may optimize for **passing validation** rather than **doing the work**.
|
|
2627
|
+
|
|
2628
|
+
### Design Principle (Locked)
|
|
2629
|
+
|
|
2630
|
+
**Validate work artifacts, not completion signals.**
|
|
2631
|
+
|
|
2632
|
+
**Good validation** (evidence-based):
|
|
2633
|
+
```json
|
|
2634
|
+
{
|
|
2635
|
+
"validationCriteria": {
|
|
2636
|
+
"and": [
|
|
2637
|
+
{"type": "regex", "pattern": "Finding \\d+:", "message": "Must list numbered findings"},
|
|
2638
|
+
{"type": "contains", "value": "severity:", "message": "Each finding needs severity"},
|
|
2639
|
+
{"type": "length", "min": 200, "message": "Substantive analysis required"}
|
|
2640
|
+
]
|
|
2641
|
+
}
|
|
2642
|
+
}
|
|
2643
|
+
```
|
|
2644
|
+
|
|
2645
|
+
**Why this works**:
|
|
2646
|
+
- Can't fake "Finding 1: ... severity: high ..." without actual findings
|
|
2647
|
+
- Length requirement ensures substance
|
|
2648
|
+
- Multiple independent checks prevent minimal compliance
|
|
2649
|
+
|
|
2650
|
+
### Authoring Guidelines (Normative)
|
|
2651
|
+
|
|
2652
|
+
**DO**:
|
|
2653
|
+
- ✅ Require specific work products ("list 3 findings", "cite file:line", "explain rationale")
|
|
2654
|
+
- ✅ Check for patterns that indicate work ("Finding N:", "file:line", "because ...")
|
|
2655
|
+
- ✅ Use length requirements to ensure substance (but not arbitrarily high)
|
|
2656
|
+
- ✅ Combine multiple criteria to cross-validate
|
|
2657
|
+
|
|
2658
|
+
**DON'T**:
|
|
2659
|
+
- ❌ Validate single boolean flags ("complete = true")
|
|
2660
|
+
- ❌ Check for magic phrases without context ("must contain 'done'")
|
|
2661
|
+
- ❌ Rely on agent honesty without verification
|
|
2662
|
+
- ❌ Make validation trivially satisfiable with minimal text
|
|
2663
|
+
|
|
2664
|
+
### Examples
|
|
2665
|
+
|
|
2666
|
+
#### Anti-Pattern: Flag-Only Validation
|
|
2667
|
+
```json
|
|
2668
|
+
{
|
|
2669
|
+
"prompt": "Create a security analysis",
|
|
2670
|
+
"validationCriteria": {"type": "contains", "value": "securityReviewComplete"}
|
|
2671
|
+
}
|
|
2672
|
+
```
|
|
2673
|
+
**Gameable**: Agent writes "securityReviewComplete" without analysis.
|
|
2674
|
+
|
|
2675
|
+
#### Good Pattern: Evidence-Based Validation
|
|
2676
|
+
```json
|
|
2677
|
+
{
|
|
2678
|
+
"prompt": "Create a security analysis. List findings with severity and file locations.",
|
|
2679
|
+
"validationCriteria": {
|
|
2680
|
+
"and": [
|
|
2681
|
+
{"type": "regex", "pattern": "Finding \\d+:.*severity:(high|medium|low)", "message": "List findings with severity"},
|
|
2682
|
+
{"type": "contains", "value": "file:", "message": "Reference specific files"},
|
|
2683
|
+
{"type": "length", "min": 300, "max": 5000, "message": "Substantive but concise"}
|
|
2684
|
+
]
|
|
2685
|
+
}
|
|
2686
|
+
}
|
|
2687
|
+
```
|
|
2688
|
+
**Not gameable**: Requires structured findings with actual content.
|
|
2689
|
+
|
|
2690
|
+
### Rationale
|
|
2691
|
+
|
|
2692
|
+
**Why this matters now**:
|
|
2693
|
+
- Layer 3 implementation (agent execution guidance) will make validation requirements visible to agents
|
|
2694
|
+
- Whether via prompt injection, _guide fields, or enhanced errors, agents will see what's checked
|
|
2695
|
+
- If validations are flag-based, agents will optimize for flags instead of work quality
|
|
2696
|
+
- Evidence-based validation prevents this by requiring actual work artifacts
|
|
2697
|
+
|
|
2698
|
+
**Philosophy alignment**:
|
|
2699
|
+
- "Fail fast with meaningful error messages": Evidence-based validation fails fast when work is missing
|
|
2700
|
+
- "Validate at boundaries": Checking for evidence IS boundary validation
|
|
2701
|
+
- "Errors as data": Can return structured "missing finding N" errors
|
|
2702
|
+
|
|
2703
|
+
### Evolution: outputContract (Typed Artifacts, PR #61)
|
|
2704
|
+
|
|
2705
|
+
**Status**: ✅ Implemented — `outputContract` is the preferred approach going forward.
|
|
2706
|
+
|
|
2707
|
+
The `validationCriteria` evidence-based approach (above) reduces gameability but is still fundamentally **prose-based** — agents can construct text that matches patterns without genuine work. The architectural fix is **typed artifacts via `outputContract`**.
|
|
2708
|
+
|
|
2709
|
+
**How it works**:
|
|
2710
|
+
- Steps declare `outputContract: { contractRef: "wr.contracts.loop_control" }` instead of `validationCriteria`
|
|
2711
|
+
- Agents must provide structured JSON artifacts (e.g., `{ kind: "wr.loop_control", decision: "continue" }`)
|
|
2712
|
+
- Validation is schema-based (Zod), not substring/regex
|
|
2713
|
+
- System-injected guidance tells agents exactly what's required (from contract metadata, not authored prompts)
|
|
2714
|
+
|
|
2715
|
+
**Why this is better**:
|
|
2716
|
+
- **Not gameable**: Can't fake a typed artifact with prose
|
|
2717
|
+
- **Machine-checkable**: Schema validation is deterministic
|
|
2718
|
+
- **Clear contract**: The interface between agent and engine is explicit
|
|
2719
|
+
- **Extensible**: New artifact types can be added without changing the validation framework
|
|
2720
|
+
|
|
2721
|
+
**Backward compatibility**: Both `validationCriteria` (legacy) and `outputContract` (preferred) are supported. `outputContract` takes priority when both are present.
|
|
2722
|
+
|
|
2723
|
+
**For new workflows**: Use `outputContract` with typed artifact schemas. Fall back to `validationCriteria` only when evidence-based prose validation is genuinely the right tool.
|
|
2724
|
+
|
|
2725
|
+
### Migration Path
|
|
2726
|
+
|
|
2727
|
+
**Phase 1** (done): `outputContract` implemented alongside `validationCriteria`
|
|
2728
|
+
**Phase 2** (pending): Migrate production workflows from `validationCriteria` to `outputContract`
|
|
2729
|
+
**Phase 3** (future): Deprecate `validationCriteria` once all workflows migrated
|
|
2730
|
+
|
|
2731
|
+
**For existing workflows**:
|
|
2732
|
+
1. Identify steps where typed artifacts can replace prose validation
|
|
2733
|
+
2. Add `outputContract` with appropriate contract reference
|
|
2734
|
+
3. Remove `validationCriteria` once artifact path is validated
|
|
2735
|
+
4. For steps where prose validation is genuinely appropriate, keep `validationCriteria` with evidence-based patterns
|
|
2736
|
+
|
|
2737
|
+
### Related Design Locks
|
|
2738
|
+
|
|
2739
|
+
- §18.1: notesMarkdown semantics (per-step fresh)
|
|
2740
|
+
- §18.2: Context persistence (context_set event)
|
|
2741
|
+
- §18.3: Agent execution guidance (Layer 3 - prompt-based requirement injection + system-injected contract guidance)
|
|
2742
|
+
|
|
2743
|
+
---
|
|
2744
|
+
|
|
2745
|
+
Important notes from the user:
|
|
2746
|
+
- Agents get confused if multiple workflows are started in the same chat due to the tokens. Need to investigate a way to fix this.
|