@exaudeus/workrail 3.27.0 → 3.29.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console/assets/{index-FtTaDku8.js → index-BZ6HkxGf.js} +1 -1
- package/dist/console/index.html +1 -1
- package/dist/manifest.json +3 -3
- package/docs/README.md +57 -0
- package/docs/adrs/001-hybrid-storage-backend.md +38 -0
- package/docs/adrs/002-four-layer-context-classification.md +38 -0
- package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
- package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
- package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
- package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
- package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
- package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
- package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
- package/docs/adrs/010-release-pipeline.md +89 -0
- package/docs/architecture/README.md +7 -0
- package/docs/architecture/refactor-audit.md +364 -0
- package/docs/authoring-v2.md +527 -0
- package/docs/authoring.md +873 -0
- package/docs/changelog-recent.md +201 -0
- package/docs/configuration.md +505 -0
- package/docs/ctc-mcp-proposal.md +518 -0
- package/docs/design/README.md +22 -0
- package/docs/design/agent-cascade-protocol.md +96 -0
- package/docs/design/autonomous-console-design-candidates.md +253 -0
- package/docs/design/autonomous-console-design-review.md +111 -0
- package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
- package/docs/design/claude-code-source-deep-dive.md +713 -0
- package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
- package/docs/design/console-execution-trace-candidates-final.md +160 -0
- package/docs/design/console-execution-trace-candidates.md +211 -0
- package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
- package/docs/design/console-execution-trace-design-review.md +74 -0
- package/docs/design/console-execution-trace-discovery.md +394 -0
- package/docs/design/console-execution-trace-final-review.md +77 -0
- package/docs/design/console-execution-trace-review.md +92 -0
- package/docs/design/console-performance-discovery.md +415 -0
- package/docs/design/console-ui-backlog.md +280 -0
- package/docs/design/daemon-architecture-discovery.md +853 -0
- package/docs/design/daemon-design-candidates.md +318 -0
- package/docs/design/daemon-design-review-findings.md +119 -0
- package/docs/design/daemon-engine-design-candidates.md +210 -0
- package/docs/design/daemon-engine-design-review.md +131 -0
- package/docs/design/daemon-execution-engine-discovery.md +280 -0
- package/docs/design/daemon-gap-analysis.md +554 -0
- package/docs/design/daemon-owns-console-plan.md +168 -0
- package/docs/design/daemon-owns-console-review.md +91 -0
- package/docs/design/daemon-owns-console.md +195 -0
- package/docs/design/data-model-erd.md +11 -0
- package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
- package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
- package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
- package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
- package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
- package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
- package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
- package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
- package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
- package/docs/design/list-workflows-latency-fix-plan.md +128 -0
- package/docs/design/list-workflows-latency-fix-review.md +55 -0
- package/docs/design/list-workflows-latency-fix.md +109 -0
- package/docs/design/native-context-management-api.md +11 -0
- package/docs/design/performance-sweep-2026-04.md +96 -0
- package/docs/design/routines-guide.md +219 -0
- package/docs/design/sequence-diagrams.md +11 -0
- package/docs/design/subagent-design-principles.md +220 -0
- package/docs/design/temporal-patterns-design-candidates.md +312 -0
- package/docs/design/temporal-patterns-design-review-findings.md +163 -0
- package/docs/design/test-isolation-from-config-file.md +335 -0
- package/docs/design/v2-core-design-locks.md +2746 -0
- package/docs/design/v2-lock-registry.json +734 -0
- package/docs/design/workflow-authoring-v2.md +1044 -0
- package/docs/design/workflow-docs-spec.md +218 -0
- package/docs/design/workflow-extension-points.md +687 -0
- package/docs/design/workrail-auto-trigger-system.md +359 -0
- package/docs/design/workrail-config-file-discovery.md +513 -0
- package/docs/docker.md +110 -0
- package/docs/generated/v2-lock-closure-plan.md +26 -0
- package/docs/generated/v2-lock-coverage.json +797 -0
- package/docs/generated/v2-lock-coverage.md +177 -0
- package/docs/ideas/backlog.md +3927 -0
- package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
- package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
- package/docs/ideas/implementation_plan.md +249 -0
- package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
- package/docs/implementation/02-architecture.md +316 -0
- package/docs/implementation/04-testing-strategy.md +124 -0
- package/docs/implementation/09-simple-workflow-guide.md +835 -0
- package/docs/implementation/13-advanced-validation-guide.md +874 -0
- package/docs/implementation/README.md +21 -0
- package/docs/integrations/claude-code.md +300 -0
- package/docs/integrations/firebender.md +315 -0
- package/docs/migration/v0.1.0.md +147 -0
- package/docs/naming-conventions.md +45 -0
- package/docs/planning/README.md +104 -0
- package/docs/planning/github-ticketing-playbook.md +195 -0
- package/docs/plans/README.md +24 -0
- package/docs/plans/agent-managed-ticketing-design.md +605 -0
- package/docs/plans/agentic-orchestration-roadmap.md +112 -0
- package/docs/plans/assessment-gates-engine-handoff.md +536 -0
- package/docs/plans/content-coherence-and-references.md +151 -0
- package/docs/plans/library-extraction-plan.md +340 -0
- package/docs/plans/mr-review-workflow-redesign.md +1451 -0
- package/docs/plans/native-context-management-epic.md +11 -0
- package/docs/plans/perf-fixes-design-candidates.md +225 -0
- package/docs/plans/perf-fixes-design-review-findings.md +61 -0
- package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
- package/docs/plans/perf-fixes-new-issues-review.md +110 -0
- package/docs/plans/prompt-fragments.md +53 -0
- package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
- package/docs/plans/ui-ux-workflow-discovery.md +100 -0
- package/docs/plans/ui-ux-workflow-review.md +48 -0
- package/docs/plans/v2-followup-enhancements.md +587 -0
- package/docs/plans/workflow-categories-candidates.md +105 -0
- package/docs/plans/workflow-categories-discovery.md +110 -0
- package/docs/plans/workflow-categories-review.md +51 -0
- package/docs/plans/workflow-discovery-model-candidates.md +94 -0
- package/docs/plans/workflow-discovery-model-discovery.md +74 -0
- package/docs/plans/workflow-discovery-model-review.md +48 -0
- package/docs/plans/workflow-source-setup-phase-1.md +245 -0
- package/docs/plans/workflow-source-setup-phase-2.md +361 -0
- package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
- package/docs/plans/workflow-staleness-detection-review.md +58 -0
- package/docs/plans/workflow-staleness-detection.md +80 -0
- package/docs/plans/workflow-v2-design.md +69 -0
- package/docs/plans/workflow-v2-roadmap.md +74 -0
- package/docs/plans/workflow-validation-design.md +98 -0
- package/docs/plans/workflow-validation-roadmap.md +108 -0
- package/docs/plans/workrail-platform-vision.md +420 -0
- package/docs/reference/agent-context-cleaner-snippet.md +94 -0
- package/docs/reference/agent-context-guidance.md +140 -0
- package/docs/reference/context-optimization.md +284 -0
- package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
- package/docs/reference/example-workflow-repository-template/README.md +268 -0
- package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
- package/docs/reference/external-workflow-repositories.md +916 -0
- package/docs/reference/feature-flags-architecture.md +472 -0
- package/docs/reference/feature-flags.md +349 -0
- package/docs/reference/god-tier-workflow-validation.md +272 -0
- package/docs/reference/loop-optimization.md +209 -0
- package/docs/reference/loop-validation.md +176 -0
- package/docs/reference/loops.md +465 -0
- package/docs/reference/mcp-platform-constraints.md +59 -0
- package/docs/reference/recovery.md +88 -0
- package/docs/reference/releases.md +177 -0
- package/docs/reference/troubleshooting.md +105 -0
- package/docs/reference/workflow-execution-contract.md +998 -0
- package/docs/roadmap/README.md +22 -0
- package/docs/roadmap/legacy-planning-status.md +103 -0
- package/docs/roadmap/now-next-later.md +70 -0
- package/docs/roadmap/open-work-inventory.md +389 -0
- package/docs/tickets/README.md +39 -0
- package/docs/tickets/next-up.md +76 -0
- package/docs/workflow-management.md +317 -0
- package/docs/workflow-templates.md +423 -0
- package/docs/workflow-validation.md +184 -0
- package/docs/workflows.md +254 -0
- package/package.json +3 -1
- package/spec/authoring-spec.json +61 -16
- package/workflows/workflow-for-workflows.json +252 -93
- package/workflows/workflow-for-workflows.v2.json +188 -77
|
@@ -0,0 +1,312 @@
|
|
|
1
|
+
# Temporal Patterns Design Candidates for WorkRail Auto
|
|
2
|
+
|
|
3
|
+
**Status:** Candidates generated from Temporal.io / Prefect / Dagster discovery (Apr 14, 2026)
|
|
4
|
+
**Discovery doc:** `docs/ideas/temporal-discovery.md`
|
|
5
|
+
**For:** WorkRail Auto daemon -- four pattern areas: durability, approval gates, versioning, trigger system
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Problem Understanding
|
|
10
|
+
|
|
11
|
+
### Core tensions
|
|
12
|
+
|
|
13
|
+
**T1: Temporal's determinism assumption vs AI non-determinism**
|
|
14
|
+
Temporal's entire event-sourcing replay model assumes workflow code is deterministic. WorkRail's domain (AI agent tool calls) is inherently non-deterministic. WorkRail cannot adopt Temporal's replay model. WorkRail's step-level checkpoint token is the correct architecture for the domain -- but it means a crashed step restarts from the beginning of the step, not from the last completed tool call.
|
|
15
|
+
|
|
16
|
+
**T2: Portability (`npx -y`) vs durability infrastructure**
|
|
17
|
+
Temporal, Prefect, and Dagster all require a server (PostgreSQL/Cassandra + UI). WorkRail must run with zero infrastructure beyond a filesystem. All durability must be file-based. Atomic writes (temp \u2192 fsync \u2192 rename) are the correct primitive.
|
|
18
|
+
|
|
19
|
+
**T3: Zero compute while waiting for human approval**
|
|
20
|
+
Temporal's `condition(fn, timeout)` is elegant: workflow has no active WFT while waiting, server holds state. WorkRail's daemon cannot hold a connection open indefinitely. Approach: persist checkpoint token to disk, daemon exits or loops, REST endpoint triggers resume from persisted token.
|
|
21
|
+
|
|
22
|
+
**T4: Multi-tenancy seams without current single-user complexity**
|
|
23
|
+
Adding `orgId` everywhere now = premature complexity. Not adding it now = future breaking refactor. Correct: design as a DI-injected port (`OrgContext`), default to single-user, make multi-tenancy an adapter.
|
|
24
|
+
|
|
25
|
+
### What makes this hard
|
|
26
|
+
|
|
27
|
+
WorkRail is a novel category (AI agent process governance). Every Temporal pattern must be adapted at the right abstraction level -- not too literally (adopt replay model), not too abstractly (just 'be durable'). The correct level: take each *invariant* Temporal enforces and find the WorkRail-appropriate mechanism that enforces the same invariant without Temporal's infrastructure.
|
|
28
|
+
|
|
29
|
+
### Likely seam
|
|
30
|
+
|
|
31
|
+
The daemon loop (`src/daemon/runWorkflow()` -- not yet built). It sits between trigger dispatch and engine calls. All four pattern areas land here:
|
|
32
|
+
- Durability: token persistence before each `continue_workflow` call
|
|
33
|
+
- Approval gate: polling loop entered after `approvalGate` step is detected
|
|
34
|
+
- Versioning: pinned snapshot loaded at session start, passed through each step
|
|
35
|
+
- Trigger system: cursor committed after session start
|
|
36
|
+
|
|
37
|
+
### Key discovery
|
|
38
|
+
|
|
39
|
+
`PinnedWorkflowStorePortV2` already exists (`src/v2/ports/pinned-workflow-store.port.ts`). It stores compiled workflow snapshots by content-addressed `workflowHash`. The `run_started` event already records `workflowHash`. The versioning "gap" may already be solved -- requires verification before building new code.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Philosophy Constraints
|
|
44
|
+
|
|
45
|
+
From `CLAUDE.md` and confirmed by codebase:
|
|
46
|
+
|
|
47
|
+
- **Errors as data** -- `neverthrow` `Result`/`ResultAsync` throughout. No exceptions in business logic.
|
|
48
|
+
- **Make illegal states unrepresentable** -- `WithHealthySessionLock` as capability token. `WAITING_FOR_APPROVAL` must be a typed domain state in the event log, not a context variable flag.
|
|
49
|
+
- **Explicit domain types** -- `SessionId`, `WorkflowHash`, `TriggerId` as branded types, not raw strings.
|
|
50
|
+
- **Validate at boundaries** -- Webhook payloads and approval REST calls validated at HTTP boundary; core logic trusts the validated result.
|
|
51
|
+
- **Immutability** -- Session event log is append-only. Pinned workflow snapshots are never mutated in place.
|
|
52
|
+
- **DI for all I/O** -- All new ports injected via `V2Dependencies`. No global state.
|
|
53
|
+
|
|
54
|
+
**Philosophy conflicts in candidates:**
|
|
55
|
+
- Approval gate REST endpoint that accepts unsigned approvals violates 'validate at boundaries' -- requires HMAC-signed approval token.
|
|
56
|
+
- `TriggerId` as raw string would violate 'explicit domain types' -- must be a branded type.
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Impact Surface
|
|
61
|
+
|
|
62
|
+
- `src/mcp/handlers/v2-workflow.ts` -- `executeContinueWorkflow` path. Versioning candidate must verify this path uses pinned snapshots before building new code.
|
|
63
|
+
- `src/v2/durable-core/schemas/session/events.ts` -- Adding `step_approval_pending` and `step_approval_received` domain events requires extending the `DomainEventV1` discriminated union.
|
|
64
|
+
- `src/v2/ports/` -- Four new ports: `DaemonStateStore`, `ApprovalGatePort`, `TriggerSourcePortV2`, `TriggerCursorStore`.
|
|
65
|
+
- Console service (`src/v2/usecases/console-service.ts`) -- Must surface `WAITING_FOR_APPROVAL` sessions as a distinct state in the DAG.
|
|
66
|
+
- Daemon token persistence (`~/.workrail/daemon-state.json`) -- New file, new path. Does not interact with existing `~/.workrail/config.json` or session store.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Candidates
|
|
71
|
+
|
|
72
|
+
### Candidate A: Daemon Durability (crash recovery)
|
|
73
|
+
|
|
74
|
+
**Summary:** Atomic token persistence to `~/.workrail/daemon-state.json` before each `continue_workflow` call. On daemon restart, read file and call `continue_workflow(token)` to rehydrate. No new session-store machinery.
|
|
75
|
+
|
|
76
|
+
**Approach in concrete terms:**
|
|
77
|
+
```typescript
|
|
78
|
+
// DaemonStateStore port
|
|
79
|
+
interface DaemonStateStore {
|
|
80
|
+
persistContinueToken(sessionId: SessionId, token: string, stepIndex: number): ResultAsync<void, DaemonStateError>;
|
|
81
|
+
loadContinueToken(sessionId: SessionId): ResultAsync<{ token: string; stepIndex: number } | null, DaemonStateError>;
|
|
82
|
+
}
|
|
83
|
+
// Implementation: atomic write to ~/.workrail/daemon-state.json
|
|
84
|
+
// Pattern: temp file + fsync + rename (same as session-store/index.ts)
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
**Tensions resolved:** T2 (no infra, just a file).
|
|
88
|
+
**Tensions accepted:** T1 -- tool-call-level durability not addressed. A 30-tool-call step that crashes at call 25 restarts from call 1.
|
|
89
|
+
|
|
90
|
+
**Boundary:** `DaemonStateStore` port in `src/v2/ports/`. Implementation in `src/v2/infra/local/daemon-state-store/`. Called by `runWorkflow()` before every `continue_workflow`.
|
|
91
|
+
|
|
92
|
+
**Why this boundary is best-fit:** The checkpoint token is outside the session lock scope by design. The session store cannot hold the recovery token because acquiring the lock is part of the recovery process. These are two separate concerns.
|
|
93
|
+
|
|
94
|
+
**Failure mode:** Disk write fails between token receipt and state file write -- session orphaned with no recovery token. Mitigation: atomic temp\u2192rename write (same pattern as session store). If write fails, daemon logs error and exits cleanly; operator retries from last persisted token.
|
|
95
|
+
|
|
96
|
+
**Repo-pattern relationship:** Directly follows `session-store/index.ts` atomic write pattern. Follows `token-alias-store.port.ts` interface shape.
|
|
97
|
+
|
|
98
|
+
**Gains:** Zero new infra. Crash recovery for all daemon sessions. Easy to test (mock the file write).
|
|
99
|
+
**Loses:** Sub-step (tool-call-level) crash recovery -- accepted tradeoff for v1.
|
|
100
|
+
|
|
101
|
+
**Scope judgment:** Best-fit. Does exactly what's needed for daemon crash recovery. Not too narrow (covers all daemon session types), not too broad (doesn't touch session engine internals).
|
|
102
|
+
|
|
103
|
+
**Philosophy fit:** Immutability (token file is write-once-per-step). Errors as data (ResultAsync). Explicit domain types (SessionId branded type, not raw string key).
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
### Candidate B: Human Approval Gate
|
|
108
|
+
|
|
109
|
+
**Summary:** New workflow step field `approvalGate: { notifyChannels: string[], timeoutMs: number }` causes daemon to emit a typed `step_approval_pending` domain event, persist checkpoint token, dispatch notifications, and enter a polling loop. REST endpoint `POST /api/v2/sessions/:id/approve` with HMAC-signed approval token appends `step_approval_received` event and releases the gate.
|
|
110
|
+
|
|
111
|
+
**Approach in concrete terms:**
|
|
112
|
+
|
|
113
|
+
New domain events (extend `DomainEventV1` discriminated union):
|
|
114
|
+
```typescript
|
|
115
|
+
{ kind: 'step_approval_pending'; sessionId: SessionId; stepId: string; timeoutAt: ISOString; notifyChannels: string[] }
|
|
116
|
+
{ kind: 'step_approval_received'; sessionId: SessionId; stepId: string; approvedBy: string; approvedAt: ISOString }
|
|
117
|
+
{ kind: 'step_approval_timeout'; sessionId: SessionId; stepId: string; timedOutAt: ISOString }
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Daemon flow:
|
|
121
|
+
1. `continue_workflow` returns `{ awaitingApproval: true, stepId, timeoutMs }`
|
|
122
|
+
2. Daemon appends `step_approval_pending` event
|
|
123
|
+
3. Persists checkpoint token to `DaemonStateStore`
|
|
124
|
+
4. Dispatches notifications (best-effort -- gate is not blocked by notify failure)
|
|
125
|
+
5. Enters 1s polling loop: checks session event log for `step_approval_received` or `step_approval_timeout`
|
|
126
|
+
6. On approval: calls `continue_workflow` with stored checkpoint token to resume
|
|
127
|
+
7. On timeout: appends `step_approval_timeout`, marks session FAILED
|
|
128
|
+
|
|
129
|
+
REST endpoint: `POST /api/v2/sessions/:id/approve` -- body includes HMAC-signed approval token (keyed to session + stepId). Appends `step_approval_received` event. Signed token prevents unauthorized advances.
|
|
130
|
+
|
|
131
|
+
**Tensions resolved:** T3 (process may exit while waiting -- checkpoint token persisted before approval wait, daemon can restart and re-enter polling loop from session store state).
|
|
132
|
+
**Tensions accepted:** T4 -- multi-tenancy (per-org notification routing) deferred to cloud tier.
|
|
133
|
+
|
|
134
|
+
**Boundary:** Daemon loop (`runWorkflow()`) + new REST route in existing Express server + new domain events in session event log.
|
|
135
|
+
|
|
136
|
+
**Why this boundary is best-fit:** The approval gate is a daemon concern, not an engine concern. The engine just needs to tell the daemon 'this step requires approval before advancing.' The engine already has a step-blocking mechanism (assessment gates) -- approval gate reuses the same blocking pattern.
|
|
137
|
+
|
|
138
|
+
**Failure mode:** Approval REST endpoint unreachable (daemon not running, firewall). Mitigation: console UI always works (same REST server). If daemon is not running, the approval write still succeeds -- daemon picks it up on next start via DaemonStateStore recovery.
|
|
139
|
+
|
|
140
|
+
**Repo-pattern relationship:** New domain events follow `events.ts` discriminated union pattern. REST route follows existing Express server pattern (`src/infrastructure/session/HttpServer.ts`).
|
|
141
|
+
|
|
142
|
+
**Gains:** Human approval is a first-class typed state in the session history. Survives daemon restarts. Extensible notification channels. Console can show `[WAITING]` badge.
|
|
143
|
+
**Loses:** REST polling (1s interval) is less elegant than Temporal's condition(). For human-scale waits (minutes to hours), polling at 1s is completely acceptable.
|
|
144
|
+
|
|
145
|
+
**Scope judgment:** Best-fit. Scoped to the approval gate use case. The notification channel list is extensible without design changes.
|
|
146
|
+
|
|
147
|
+
**Philosophy fit:** Make illegal states unrepresentable -- `step_approval_pending` is a typed event, not a context variable flag. Errors as data -- notification failure returns `Result<void, NotifyError>`, gate not failed. Validate at boundaries -- HMAC-signed approval token validated at REST boundary.
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
### Candidate C: Workflow Versioning for Daemon
|
|
152
|
+
|
|
153
|
+
**Summary:** Verify first whether `executeContinueWorkflow` already uses the pinned workflow snapshot from `PinnedWorkflowStorePortV2`. If yes: document and move on (no new code). If no: add a 5-line path to load pinned snapshot instead of re-resolving from registry.
|
|
154
|
+
|
|
155
|
+
**Critical context:** `PinnedWorkflowStorePortV2` already exists and is designed exactly for this purpose (from port docstring: "Enable deterministic execution even when source workflow changes"). The `run_started` event already records `workflowHash`. Export bundles already embed pinned workflows.
|
|
156
|
+
|
|
157
|
+
**Verification path:**
|
|
158
|
+
```bash
|
|
159
|
+
grep -n "pinnedWorkflow\|PinnedWorkflow\|workflowHash.*get\|pinnedStore" \
|
|
160
|
+
/Users/etienneb/git/personal/workrail/src/mcp/handlers/v2-workflow.ts
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**If already solved (most likely):** The daemon just passes the token from `continue_workflow` response to the next call. The engine resolves the pinned snapshot internally via `workflowHash` in the token. No daemon-level versioning code needed.
|
|
164
|
+
|
|
165
|
+
**If NOT solved:** Add to `executeContinueWorkflow`:
|
|
166
|
+
```typescript
|
|
167
|
+
const pinned = await pinnedWorkflowStore.get(session.workflowHash);
|
|
168
|
+
if (!pinned) return err({ code: 'PINNED_WORKFLOW_NOT_FOUND', workflowHash: session.workflowHash });
|
|
169
|
+
// Use `pinned` for step interpretation instead of registry lookup
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**Tensions resolved:** T2 (no new infra -- existing store). Content-addressed pinning means workflow redeploys cannot affect in-flight sessions.
|
|
173
|
+
**Tensions accepted:** None new -- this is the designed behavior.
|
|
174
|
+
|
|
175
|
+
**Boundary:** Verification of existing engine path. Possible 5-line addition to `src/mcp/handlers/v2-workflow.ts`.
|
|
176
|
+
|
|
177
|
+
**Failure mode (if pinned snapshot missing):** `get(workflowHash)` returns null for a session started before pinned store was implemented. Mitigation: existing engine likely falls back to registry -- this is the pre-pinning behavior and is safe for same-version redeploys.
|
|
178
|
+
|
|
179
|
+
**Repo-pattern relationship:** 100% follows existing pattern. PinnedWorkflowStore is already a registered port in `V2Dependencies`.
|
|
180
|
+
|
|
181
|
+
**Gains:** Deploy-safe in-flight sessions with zero new code (if verification passes).
|
|
182
|
+
**Loses:** Nothing. This is a verification task masquerading as a design decision.
|
|
183
|
+
|
|
184
|
+
**Scope judgment:** Too narrow if verification fails. Best-fit if verification passes.
|
|
185
|
+
|
|
186
|
+
**Philosophy fit:** Determinism over cleverness -- same workflowHash = same compiled workflow content. Immutability -- pinned snapshots never mutated.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
### Candidate D: Trigger System Cursor Model
|
|
191
|
+
|
|
192
|
+
**Summary:** Each trigger source (GitLab webhook, Jira webhook, cron) implements a `TriggerSourcePortV2<TEvent, TCursor>` port with a typed cursor. `TriggerCursorStore` persists cursor to `~/.workrail/triggers/<sourceId>.cursor`. On each poll, the dispatcher compares cursor, starts sessions for new events, commits cursor after session start. Adapted from Dagster's sensor pattern.
|
|
193
|
+
|
|
194
|
+
**Approach in concrete terms:**
|
|
195
|
+
|
|
196
|
+
```typescript
|
|
197
|
+
// Core port
|
|
198
|
+
interface TriggerSourcePortV2<TEvent, TCursor> {
|
|
199
|
+
readonly sourceId: TriggerId; // branded type
|
|
200
|
+
poll(cursor: TCursor | null): ResultAsync<TriggerPollResult<TEvent, TCursor>, TriggerError>;
|
|
201
|
+
}
|
|
202
|
+
|
|
203
|
+
interface TriggerPollResult<TEvent, TCursor> {
|
|
204
|
+
readonly events: readonly TEvent[];
|
|
205
|
+
readonly nextCursor: TCursor; // always present, even if events is empty
|
|
206
|
+
}
|
|
207
|
+
|
|
208
|
+
// Cursor store
|
|
209
|
+
interface TriggerCursorStore {
|
|
210
|
+
getCursor(sourceId: TriggerId): ResultAsync<string | null, TriggerCursorError>;
|
|
211
|
+
setCursor(sourceId: TriggerId, cursor: string): ResultAsync<void, TriggerCursorError>;
|
|
212
|
+
}
|
|
213
|
+
|
|
214
|
+
// Implementations
|
|
215
|
+
class GitLabMRTrigger implements TriggerSourcePortV2<GitLabMREvent, GitLabCursor> {
|
|
216
|
+
poll(cursor) { /* GET /api/v4/merge_requests?updated_after=cursor */ }
|
|
217
|
+
}
|
|
218
|
+
class CronTrigger implements TriggerSourcePortV2<CronTickEvent, ISOTimestamp> {
|
|
219
|
+
poll(cursor) { /* compute missed ticks since cursor */ }
|
|
220
|
+
}
|
|
221
|
+
|
|
222
|
+
// Dispatcher
|
|
223
|
+
class TriggerDispatcher {
|
|
224
|
+
async pollOnce(source): Promise<void> {
|
|
225
|
+
const cursor = await cursorStore.getCursor(source.sourceId);
|
|
226
|
+
const { events, nextCursor } = await source.poll(cursor);
|
|
227
|
+
for (const event of events) {
|
|
228
|
+
// Use event.id as workflowId for idempotency (Dagster's run_key pattern)
|
|
229
|
+
await engine.startWorkflow({ workflowId: event.id, ... });
|
|
230
|
+
}
|
|
231
|
+
await cursorStore.setCursor(source.sourceId, String(nextCursor));
|
|
232
|
+
}
|
|
233
|
+
}
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
**Idempotency key:** Event ID used as workflowId. If cursor commit fails and the same event is dispatched twice, `start_workflow` with the same workflowId returns `WORKFLOW_ALREADY_EXISTS` (or equivalent) -- no duplicate session. This is Dagster's `run_key` pattern exactly.
|
|
237
|
+
|
|
238
|
+
**Cron missed runs:** `CronTrigger.poll(cursor)` computes all missed ticks from `cursor` (last fired time) to `now`. Fires one session per missed tick. Max catchup configurable (`maxMissedRuns` per source).
|
|
239
|
+
|
|
240
|
+
**Tensions resolved:** T2 (cursor files, no infra). Daemon restart safety (cursor persisted atomically after each batch of sessions started).
|
|
241
|
+
**Tensions accepted:** T4 -- per-org trigger sources deferred to cloud tier (add `orgId` to `TriggerCursorStore` key path).
|
|
242
|
+
|
|
243
|
+
**Boundary:** New `src/trigger/` module. Isolated from session engine. `TriggerDispatcher` calls `executeStartWorkflow` directly (in-process model).
|
|
244
|
+
|
|
245
|
+
**Failure mode:** Cursor commit failure after session start -- same event dispatched twice. Mitigated by workflowId idempotency key.
|
|
246
|
+
|
|
247
|
+
**Repo-pattern relationship:** New ports follow `src/v2/ports/` pattern. Cursor files follow same atomic-write pattern as session store. `TriggerId` follows branded type pattern (`SessionId`, `WorkflowHash`).
|
|
248
|
+
|
|
249
|
+
**Gains:** Restart-safe trigger dispatch. No missed events. No double-fires (with idempotency key). Extensible: any new trigger source implements the port. Dagster's sensor cursor model is proven at scale.
|
|
250
|
+
**Loses:** Webhook triggers still need an HTTP receiver (separate concern from the cursor model). The cursor model handles durability; the HTTP receiver handles ingestion.
|
|
251
|
+
|
|
252
|
+
**Scope judgment:** Best-fit. The `TriggerSourcePortV2` interface is the right abstraction boundary. Not too narrow (covers cron, webhook, and future event-based triggers). Not too broad (does not redesign session start flow).
|
|
253
|
+
|
|
254
|
+
**Philosophy fit:** Explicit domain types (`TriggerId`, `TriggerPollResult`). Errors as data (`TriggerError` as discriminated union). Validate at boundaries (webhook payloads validated before entering trigger system). Functional/declarative (trigger sources are stateless functions; cursor is explicit state).
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## Comparison and Recommendation
|
|
259
|
+
|
|
260
|
+
### Matrix
|
|
261
|
+
|
|
262
|
+
| Criterion | A (Durability) | B (Approval Gate) | C (Versioning) | D (Trigger Cursor) |
|
|
263
|
+
|-----------|---------------|-------------------|----------------|-------------------|
|
|
264
|
+
| Resolves portability (T2) | Yes | Yes | Yes | Yes |
|
|
265
|
+
| Resolves approval-without-open-connection (T3) | N/A | Yes | N/A | N/A |
|
|
266
|
+
| Step-level vs tool-call-level (T1) | Accepts step-level | N/A | N/A | N/A |
|
|
267
|
+
| Multi-tenancy seam (T4) | Partial | Partial | Resolved | Deferred |
|
|
268
|
+
| Repo pattern fit | Excellent | Good | Excellent | Good |
|
|
269
|
+
| Philosophy compliance | Full | Full (with HMAC) | Full | Full |
|
|
270
|
+
| New code volume | ~80 LOC | ~200 LOC | 0-5 LOC | ~300 LOC |
|
|
271
|
+
|
|
272
|
+
### Recommendations
|
|
273
|
+
|
|
274
|
+
**Candidate A: ADOPT immediately.** Highest priority -- without crash recovery, the daemon cannot be trusted for any production use. Minimal code, follows existing patterns exactly. No design risk.
|
|
275
|
+
|
|
276
|
+
**Candidate B: ADOPT for v2 (after daemon MVP).** Approval gates are essential for WorkRail Auto's differentiation ('human in the loop'). Not MVP-blocking -- v1 daemon can run fully autonomous sessions first. The HMAC-signed approval token is required before this ships; unsigned approvals violate the security model.
|
|
277
|
+
|
|
278
|
+
**Candidate C: VERIFY first (this week).** Run the grep to confirm whether `executeContinueWorkflow` uses pinned snapshots. This is a 5-minute task. If yes, document it and move on. If no, the 5-line fix is the highest-ROI code change in the entire codebase.
|
|
279
|
+
|
|
280
|
+
**Candidate D: ADOPT for trigger system v1.** The cursor model is the correct foundation. Build `CronTrigger` + `GitLabMRTrigger` as the first two implementations. The HTTP webhook receiver is a separate concern (build it, but don't conflate it with the cursor model).
|
|
281
|
+
|
|
282
|
+
### Build order
|
|
283
|
+
|
|
284
|
+
1. Candidate A (DaemonStateStore) -- prerequisite for any daemon session
|
|
285
|
+
2. Candidate C verification -- possibly zero code, immediately valuable
|
|
286
|
+
3. Candidate D (TriggerCursorStore + CronTrigger) -- enables autonomous dispatch
|
|
287
|
+
4. Candidate B (ApprovalGate) -- enables human-in-the-loop after autonomous mode works
|
|
288
|
+
|
|
289
|
+
---
|
|
290
|
+
|
|
291
|
+
## Self-Critique
|
|
292
|
+
|
|
293
|
+
### Candidate A: strongest counter-argument
|
|
294
|
+
"Why not store the checkpoint token in the session event log itself, so there's only one durable store?" The session lock is the obstacle: to append to the session store, you need to acquire the lock. But you need the token to recover from a crashed lock. The out-of-band `daemon-state.json` correctly breaks this circular dependency. The counter-argument loses.
|
|
295
|
+
|
|
296
|
+
### Candidate B: what would tip the decision
|
|
297
|
+
If approval notifications are unreliable (Slack down, email spam-filtered), the approval gate becomes unusable even though it's technically correct. The console badge must always work even when external notifications fail. This is a UX dependency, not a design flaw.
|
|
298
|
+
|
|
299
|
+
### Candidate C: what assumption invalidates the design
|
|
300
|
+
If the engine does NOT use pinned snapshots (verification finds that `executeContinueWorkflow` re-resolves from registry), then the entire versioning story relies on workflow definitions never changing between steps. For the current MCP use case (human drives the session in one sitting), this is fine. For daemon sessions (which can run for hours), a redeploy mid-session would break the session. The 5-line fix then becomes a required safety property, not a nice-to-have.
|
|
301
|
+
|
|
302
|
+
### Candidate D: narrower option considered and rejected
|
|
303
|
+
"Just listen for webhooks and immediately start sessions -- no cursor." D1 loses because daemon restarts drop events. For Jira and GitLab webhooks (transient, not stored), a missed event during a 30-second daemon restart cannot be recovered. The cursor model only adds ~80 LOC to D1 and completely solves restart safety. D2 dominates D1.
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
## Open Questions
|
|
308
|
+
|
|
309
|
+
1. Does `executeContinueWorkflow` currently use `PinnedWorkflowStore` or re-resolve from registry? (Verify before building C.)
|
|
310
|
+
2. What is the maximum approval wait time WorkRail should support? (Affects whether 1s polling is acceptable or a file-watch is needed.)
|
|
311
|
+
3. Should `TriggerDispatcher` be a separate long-running process or part of the daemon's main event loop? (Related to `src/trigger/` vs `src/daemon/` module boundary.)
|
|
312
|
+
4. How should `step_approval_pending` sessions appear in the existing console DAG? (Requires console-service DTO extension.)
|
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
# Temporal Patterns Design Review Findings
|
|
2
|
+
|
|
3
|
+
**Status:** Review complete (Apr 14, 2026)
|
|
4
|
+
**Candidates doc:** `docs/design/temporal-patterns-design-candidates.md`
|
|
5
|
+
**Discovery doc:** `docs/ideas/temporal-discovery.md`
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Tradeoff Review
|
|
10
|
+
|
|
11
|
+
### T1: Step-level crash recovery (accepted)
|
|
12
|
+
A crashed step restarts from its beginning, not from the last completed tool call. Acceptable for v1.
|
|
13
|
+
|
|
14
|
+
**Condition that invalidates this tradeoff:** A step with non-idempotent destructive side effects (send email, create branch, post Slack message) executes those effects, then crashes before the next step is committed. On restart, the side effect fires again.
|
|
15
|
+
|
|
16
|
+
**Mitigation required before production:** The `requiredEvidence` field (planned in backlog) must be considered a prerequisite for daemon production use, not just a nice-to-have. It closes this gap for destructive-side-effect steps by requiring the agent to confirm evidence before advancing.
|
|
17
|
+
|
|
18
|
+
### T2: 1s approval gate polling (accepted)
|
|
19
|
+
For human-scale approval waits (minutes to hours), 1s polling is negligible.
|
|
20
|
+
|
|
21
|
+
**Condition that invalidates:** 100+ concurrent sessions simultaneously in approval-wait state = 6,000 file reads/minute. Upgrade path: `fs.watch` on the session log directory. No interface change needed.
|
|
22
|
+
|
|
23
|
+
### T3: Trigger cursor commit failure = possible double-fire (accepted)
|
|
24
|
+
Idempotency key (trigger event ID as workflowId) prevents actual duplicate sessions.
|
|
25
|
+
|
|
26
|
+
**Condition that invalidates:** Trigger sources without stable event IDs. Resolution: document as a `TriggerSourcePortV2` contract requirement; provide a deterministic hash utility for sources without native IDs.
|
|
27
|
+
|
|
28
|
+
### T4: No horizontal daemon scale (accepted)
|
|
29
|
+
Single-process, single-machine for v1. Multi-instance deferred to cloud tier.
|
|
30
|
+
|
|
31
|
+
**Condition that invalidates:** User attempts to run two daemon instances on the same machine with the same `~/.workrail/` directory. The trigger cursor store must use atomic writes (temp\u2192rename) to prevent corruption -- this is already in the design.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Failure Mode Review
|
|
36
|
+
|
|
37
|
+
### FM1: Disk full at token persistence window (HIGHEST RISK)
|
|
38
|
+
|
|
39
|
+
A daemon step completes (session advances) but the disk-full condition prevents `daemon-state.json` write. On restart, the daemon re-executes the completed step.
|
|
40
|
+
|
|
41
|
+
**Coverage:** Atomic temp\u2192rename prevents partial writes. The re-execution failure mode remains for the window between step completion and token file write.
|
|
42
|
+
|
|
43
|
+
**Missing mitigation:** `requiredEvidence` field. Until it ships, steps with destructive side effects should be explicitly marked as "restart-safe" in workflow authoring guidelines.
|
|
44
|
+
|
|
45
|
+
### FM2: Approval notification silent failure
|
|
46
|
+
|
|
47
|
+
All notification channels fail (Slack rate limit, email spam). Session is waiting for approval but no one knows.
|
|
48
|
+
|
|
49
|
+
**Coverage:** `step_approval_pending` event in session log. Console badge.
|
|
50
|
+
|
|
51
|
+
**Missing mitigation:** Console MUST display approval-waiting sessions prominently in the session list (not just in session detail). This is a UI requirement.
|
|
52
|
+
|
|
53
|
+
### FM3: Trigger cursor commit failure (LOW RISK)
|
|
54
|
+
|
|
55
|
+
Covered by idempotency key. Correctly handled.
|
|
56
|
+
|
|
57
|
+
### FM4: Two daemons on same machine (LOW RISK)
|
|
58
|
+
|
|
59
|
+
`LocalSessionLockV2` prevents session corruption. Trigger cursor atomic writes prevent cursor corruption. Idempotency key prevents duplicate sessions. Well-covered.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Runner-Up / Simpler Alternative Review
|
|
64
|
+
|
|
65
|
+
### Candidate B (Approval Gate) -- runner-up
|
|
66
|
+
|
|
67
|
+
Not weaker -- correctly deferred (post-daemon-MVP). One element worth pulling forward:
|
|
68
|
+
|
|
69
|
+
**GREEN finding:** Add `step_approval_pending/received/timeout` to `DomainEventV1` discriminated union NOW (zero behavior change, prevents future schema-breaking change). The schema reservation costs zero LOC.
|
|
70
|
+
|
|
71
|
+
### Simplification of Candidate A
|
|
72
|
+
|
|
73
|
+
Plain `fs.writeFile` (20 LOC) instead of a `DaemonStateStore` port (80 LOC) was considered and rejected. The port abstraction is required for testability (all I/O through injected ports -- consistent with 20+ existing ports). Inconsistency cost exceeds simplicity gain.
|
|
74
|
+
|
|
75
|
+
Scope correction: daemon ports live in `src/daemon/ports/`, not `src/v2/ports/`. This is a correct separation -- not a simplification.
|
|
76
|
+
|
|
77
|
+
### Hybrid A+B
|
|
78
|
+
|
|
79
|
+
`PersistedDaemonState` should include an optional `approvalGate` field:
|
|
80
|
+
```typescript
|
|
81
|
+
interface PersistedDaemonState {
|
|
82
|
+
sessionId: string;
|
|
83
|
+
continueToken: string;
|
|
84
|
+
stepIndex: number;
|
|
85
|
+
approvalGate?: { stepId: string; timeoutAt: string; notifyChannels: string[] };
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
Zero additional LOC in Candidate A. Makes Candidate B restart trivial when it ships.
|
|
89
|
+
|
|
90
|
+
### Simplification of Candidate D
|
|
91
|
+
|
|
92
|
+
Generic `TriggerDispatcher` can be simplified: v1 hardcodes a loop over configured trigger sources instead of a generic dispatcher. Saves ~100 LOC. `TriggerSourcePortV2` interface kept (justified by 3 required v1 trigger sources). `TriggerDispatcher` added when a third source is needed.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Philosophy Alignment
|
|
97
|
+
|
|
98
|
+
| Principle | A | B | C | D |
|
|
99
|
+
|-----------|---|---|---|---|
|
|
100
|
+
| Errors as data | SATISFIED | SATISFIED | N/A | SATISFIED |
|
|
101
|
+
| Make illegal states unrepresentable | SATISFIED | SATISFIED (pending schema) | N/A | SATISFIED |
|
|
102
|
+
| Explicit domain types | SATISFIED | SATISFIED | N/A | SATISFIED |
|
|
103
|
+
| Validate at boundaries | SATISFIED | TENSION* | N/A | SATISFIED |
|
|
104
|
+
| Immutability | SATISFIED | SATISFIED | N/A | SATISFIED |
|
|
105
|
+
| DI for boundaries | SATISFIED | SATISFIED | N/A | SATISFIED |
|
|
106
|
+
| YAGNI | SATISFIED | SATISFIED | N/A | TENSION** |
|
|
107
|
+
|
|
108
|
+
*B tension: HMAC approval token must be validated at the Express middleware layer (before session lock acquired), not inside the engine.
|
|
109
|
+
**D tension: `TriggerSourcePortV2` interface added before second source exists. Justified by v1 scope (3 sources needed).
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Findings
|
|
114
|
+
|
|
115
|
+
### RED findings (blocking)
|
|
116
|
+
|
|
117
|
+
**None.** No blocking issues found. The four candidates are sound.
|
|
118
|
+
|
|
119
|
+
### ORANGE findings (important, fix before shipping)
|
|
120
|
+
|
|
121
|
+
**O1: `requiredEvidence` is a production prerequisite for daemon use.**
|
|
122
|
+
Destructive-side-effect steps + daemon crash = side effect re-executed on restart. The daemon should not be used in production for workflows with destructive steps until `requiredEvidence` is implemented. This is not a design flaw -- it's a scope dependency.
|
|
123
|
+
|
|
124
|
+
**O2: HMAC approval token validation must be at Express middleware layer, not engine layer.**
|
|
125
|
+
Unsigned or malformed approval requests must be rejected before the session lock is acquired. This is a 'validate at boundaries' requirement.
|
|
126
|
+
|
|
127
|
+
### YELLOW findings (improve soon)
|
|
128
|
+
|
|
129
|
+
**Y1: Console must display approval-waiting sessions prominently in session list.**
|
|
130
|
+
Not just in session detail. If the console badge is buried, approval gates become unusable. This is a UI requirement for Candidate B.
|
|
131
|
+
|
|
132
|
+
**Y2: Add `step_approval_pending/received/timeout` to `DomainEventV1` discriminated union before ANY approval gate ships.**
|
|
133
|
+
Schema reservation costs zero LOC. Prevents future breaking schema changes.
|
|
134
|
+
|
|
135
|
+
**Y3: `TriggerSourcePortV2` contract must document stable event ID requirement.**
|
|
136
|
+
Trigger sources must either provide a stable event ID or a deterministic hash utility. Undocumented contract = subtle bugs when a new trigger source is implemented.
|
|
137
|
+
|
|
138
|
+
**Y4: Daemon port scope correction.**
|
|
139
|
+
New daemon ports should live in `src/daemon/ports/`, not `src/v2/ports/`. Daemon concerns are not engine concerns.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## Recommended Revisions
|
|
144
|
+
|
|
145
|
+
1. **Add `approvalGate?` field to `PersistedDaemonState`** (Candidate A hybrid with B). Zero extra LOC, makes B trivial to implement later.
|
|
146
|
+
|
|
147
|
+
2. **Add `step_approval_pending/received/timeout` event kinds to `DomainEventV1` union** (Y2). Schema reservation only.
|
|
148
|
+
|
|
149
|
+
3. **Document `requiredEvidence` as a production prerequisite** (O1). Add a warning to the daemon `README` or `backlog.md` build order notes.
|
|
150
|
+
|
|
151
|
+
4. **Simplify `TriggerDispatcher` for v1** (Candidate D simplification). Hardcoded loop over configured sources. Generic dispatcher deferred.
|
|
152
|
+
|
|
153
|
+
5. **Document `TriggerSourcePortV2` stable-ID contract** (Y3). One-line JSDoc comment on the `poll()` method.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Residual Concerns
|
|
158
|
+
|
|
159
|
+
1. **Candidate C false gap.** The landscape research incorrectly identified workflow versioning as an open gap. `PinnedWorkflowStorePortV2` already solves it completely. The discovery process caught this, but it's worth noting: the landscape research should have checked the existing codebase before concluding there was a gap. Future discovery workflows should include a codebase search step before declaring a design gap.
|
|
160
|
+
|
|
161
|
+
2. **Sub-step durability is an open question for long-running AI operations.** A WorkRail step that calls an LLM for 45 minutes (multi-hop reasoning, large context operations) would lose all progress on crash. For v1 (typical step = 1-5 minutes), this is acceptable. For WorkRail Auto cloud (long-running autonomous coding tasks), this deserves a design session when the evidence of need materializes.
|
|
162
|
+
|
|
163
|
+
3. **Multi-tenancy seams were identified but not fully designed.** `orgId` as a prefix in `TriggerCursorStore` and `DaemonStateStore` paths is the right seam, but the full multi-tenancy design (credential vault, per-org rate limits, namespace isolation) is deferred. This is correctly deferred for the local daemon -- it must be designed before cloud deployment.
|