agent-threader 2.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +126 -0
- package/compiled/claude/agent-threader/SKILL.md +361 -0
- package/compiled/codex/agent-threader/SKILL.md +361 -0
- package/compiled/cursor/rules/agent-threader.mdc +367 -0
- package/compiled/cursor/skills/agent-threader/SKILL.md +361 -0
- package/compiled/opencode/agent-threader.md +361 -0
- package/compiled/windsurf/rules/agent-threader.md +361 -0
- package/compiled/windsurf/skills/agent-threader/SKILL.md +361 -0
- package/dist/cli/commands/doctor.d.ts +6 -0
- package/dist/cli/commands/doctor.d.ts.map +1 -0
- package/dist/cli/commands/doctor.js +7 -0
- package/dist/cli/commands/doctor.js.map +1 -0
- package/dist/cli/commands/explain-error.d.ts +12 -0
- package/dist/cli/commands/explain-error.d.ts.map +1 -0
- package/dist/cli/commands/explain-error.js +23 -0
- package/dist/cli/commands/explain-error.js.map +1 -0
- package/dist/cli/commands/init-state.d.ts +6 -0
- package/dist/cli/commands/init-state.d.ts.map +1 -0
- package/dist/cli/commands/init-state.js +10 -0
- package/dist/cli/commands/init-state.js.map +1 -0
- package/dist/cli/commands/logs.d.ts +6 -0
- package/dist/cli/commands/logs.d.ts.map +1 -0
- package/dist/cli/commands/logs.js +9 -0
- package/dist/cli/commands/logs.js.map +1 -0
- package/dist/cli/commands/parse-heal.d.ts +6 -0
- package/dist/cli/commands/parse-heal.d.ts.map +1 -0
- package/dist/cli/commands/parse-heal.js +5 -0
- package/dist/cli/commands/parse-heal.js.map +1 -0
- package/dist/cli/commands/parse-result.d.ts +6 -0
- package/dist/cli/commands/parse-result.d.ts.map +1 -0
- package/dist/cli/commands/parse-result.js +5 -0
- package/dist/cli/commands/parse-result.js.map +1 -0
- package/dist/cli/commands/status.d.ts +6 -0
- package/dist/cli/commands/status.d.ts.map +1 -0
- package/dist/cli/commands/status.js +5 -0
- package/dist/cli/commands/status.js.map +1 -0
- package/dist/cli/commands/validate-manifest.d.ts +6 -0
- package/dist/cli/commands/validate-manifest.d.ts.map +1 -0
- package/dist/cli/commands/validate-manifest.js +5 -0
- package/dist/cli/commands/validate-manifest.js.map +1 -0
- package/dist/cli/index.d.ts +6 -0
- package/dist/cli/index.d.ts.map +1 -0
- package/dist/cli/index.js +360 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/cli/output-formatter.d.ts +6 -0
- package/dist/cli/output-formatter.d.ts.map +1 -0
- package/dist/cli/output-formatter.js +19 -0
- package/dist/cli/output-formatter.js.map +1 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +5 -0
- package/dist/index.js.map +1 -0
- package/dist/lib/adapters/types.d.ts +40 -0
- package/dist/lib/adapters/types.d.ts.map +1 -0
- package/dist/lib/adapters/types.js +3 -0
- package/dist/lib/adapters/types.js.map +1 -0
- package/dist/lib/contracts/schema-validator.d.ts +15 -0
- package/dist/lib/contracts/schema-validator.d.ts.map +1 -0
- package/dist/lib/contracts/schema-validator.js +63 -0
- package/dist/lib/contracts/schema-validator.js.map +1 -0
- package/dist/lib/contracts/types.d.ts +91 -0
- package/dist/lib/contracts/types.d.ts.map +1 -0
- package/dist/lib/contracts/types.js +15 -0
- package/dist/lib/contracts/types.js.map +1 -0
- package/dist/lib/contracts/validate-manifest.d.ts +16 -0
- package/dist/lib/contracts/validate-manifest.d.ts.map +1 -0
- package/dist/lib/contracts/validate-manifest.js +123 -0
- package/dist/lib/contracts/validate-manifest.js.map +1 -0
- package/dist/lib/diagnostics/doctor.d.ts +17 -0
- package/dist/lib/diagnostics/doctor.d.ts.map +1 -0
- package/dist/lib/diagnostics/doctor.js +131 -0
- package/dist/lib/diagnostics/doctor.js.map +1 -0
- package/dist/lib/errors/explain-error.d.ts +10 -0
- package/dist/lib/errors/explain-error.d.ts.map +1 -0
- package/dist/lib/errors/explain-error.js +73 -0
- package/dist/lib/errors/explain-error.js.map +1 -0
- package/dist/lib/errors/types.d.ts +16 -0
- package/dist/lib/errors/types.d.ts.map +1 -0
- package/dist/lib/errors/types.js +50 -0
- package/dist/lib/errors/types.js.map +1 -0
- package/dist/lib/index.d.ts +29 -0
- package/dist/lib/index.d.ts.map +1 -0
- package/dist/lib/index.js +25 -0
- package/dist/lib/index.js.map +1 -0
- package/dist/lib/orchestrator/batch-strategy.d.ts +9 -0
- package/dist/lib/orchestrator/batch-strategy.d.ts.map +1 -0
- package/dist/lib/orchestrator/batch-strategy.js +34 -0
- package/dist/lib/orchestrator/batch-strategy.js.map +1 -0
- package/dist/lib/orchestrator/healing-policy.d.ts +47 -0
- package/dist/lib/orchestrator/healing-policy.d.ts.map +1 -0
- package/dist/lib/orchestrator/healing-policy.js +104 -0
- package/dist/lib/orchestrator/healing-policy.js.map +1 -0
- package/dist/lib/orchestrator/index.d.ts +11 -0
- package/dist/lib/orchestrator/index.d.ts.map +1 -0
- package/dist/lib/orchestrator/index.js +11 -0
- package/dist/lib/orchestrator/index.js.map +1 -0
- package/dist/lib/orchestrator/patch-validation.d.ts +9 -0
- package/dist/lib/orchestrator/patch-validation.d.ts.map +1 -0
- package/dist/lib/orchestrator/patch-validation.js +58 -0
- package/dist/lib/orchestrator/patch-validation.js.map +1 -0
- package/dist/lib/orchestrator/scheduling.d.ts +12 -0
- package/dist/lib/orchestrator/scheduling.d.ts.map +1 -0
- package/dist/lib/orchestrator/scheduling.js +74 -0
- package/dist/lib/orchestrator/scheduling.js.map +1 -0
- package/dist/lib/orchestrator/write-safety.d.ts +14 -0
- package/dist/lib/orchestrator/write-safety.d.ts.map +1 -0
- package/dist/lib/orchestrator/write-safety.js +44 -0
- package/dist/lib/orchestrator/write-safety.js.map +1 -0
- package/dist/lib/parser/parse-heal.d.ts +12 -0
- package/dist/lib/parser/parse-heal.d.ts.map +1 -0
- package/dist/lib/parser/parse-heal.js +9 -0
- package/dist/lib/parser/parse-heal.js.map +1 -0
- package/dist/lib/parser/parse-result.d.ts +12 -0
- package/dist/lib/parser/parse-result.d.ts.map +1 -0
- package/dist/lib/parser/parse-result.js +9 -0
- package/dist/lib/parser/parse-result.js.map +1 -0
- package/dist/lib/parser/parser.d.ts +8 -0
- package/dist/lib/parser/parser.d.ts.map +1 -0
- package/dist/lib/parser/parser.js +167 -0
- package/dist/lib/parser/parser.js.map +1 -0
- package/dist/lib/state/init-state.d.ts +15 -0
- package/dist/lib/state/init-state.d.ts.map +1 -0
- package/dist/lib/state/init-state.js +50 -0
- package/dist/lib/state/init-state.js.map +1 -0
- package/dist/lib/state/logs.d.ts +19 -0
- package/dist/lib/state/logs.d.ts.map +1 -0
- package/dist/lib/state/logs.js +25 -0
- package/dist/lib/state/logs.js.map +1 -0
- package/dist/lib/state/state.d.ts +7 -0
- package/dist/lib/state/state.d.ts.map +1 -0
- package/dist/lib/state/state.js +72 -0
- package/dist/lib/state/state.js.map +1 -0
- package/dist/lib/state/status.d.ts +22 -0
- package/dist/lib/state/status.d.ts.map +1 -0
- package/dist/lib/state/status.js +34 -0
- package/dist/lib/state/status.js.map +1 -0
- package/dist/lib/state/types.d.ts +55 -0
- package/dist/lib/state/types.d.ts.map +1 -0
- package/dist/lib/state/types.js +14 -0
- package/dist/lib/state/types.js.map +1 -0
- package/install-local.sh +239 -0
- package/install.sh +36 -0
- package/package.json +55 -0
- package/site/CNAME +1 -0
- package/site/index.html +141 -0
- package/site/install.sh +36 -0
- package/site/style.css +319 -0
- package/skill/SKILL.md +127 -0
- package/skill/SPEC.md +1189 -0
- package/skill/build/compile.mjs +237 -0
- package/skill/build/manifest.json +21 -0
- package/skill/fragments/common/model-selection.md +11 -0
- package/skill/fragments/common/portability-rules.md +16 -0
- package/skill/fragments/common/workflow.md +12 -0
- package/skill/fragments/domain/adapter-model.md +42 -0
- package/skill/fragments/domain/architecture-overview.md +36 -0
- package/skill/fragments/domain/contracts.md +31 -0
- package/skill/fragments/domain/pbh-healing.md +47 -0
- package/skill/fragments/domain/state-resume.md +34 -0
- package/skill/fragments/domain/verification-safety.md +33 -0
- package/skill/fragments/meta/schemas-reference.md +13 -0
- package/skill/fragments/meta/templates-reference.md +11 -0
- package/skill/schemas/heal_decision.v2.json +100 -0
- package/skill/schemas/manifest.v2.json +91 -0
- package/skill/schemas/state.v2.json +183 -0
- package/skill/schemas/task_result.v2.json +104 -0
- package/skill/schemas/verify_profile.v2.json +61 -0
- package/skill/skills/agent-threader/agent-threader.md +85 -0
- package/skill/templates/orchestrator.ts +38 -0
- package/skill/templates/parser.ts +384 -0
- package/skill/templates/types.ts +282 -0
package/skill/SPEC.md
ADDED
|
@@ -0,0 +1,1189 @@
|
|
|
1
|
+
# AgentThreader v2: Stand-Alone Architecture Specification
|
|
2
|
+
|
|
3
|
+
**Status:** Normative v2 design proposal
|
|
4
|
+
|
|
5
|
+
**Audience:** Engineers implementing or reviewing reusable runners that invoke agentic CLIs across many tasks.
|
|
6
|
+
|
|
7
|
+
**Normative keywords:** `MUST`, `SHOULD`, and `MAY` are used in the RFC sense. `MUST` is required behavior. `SHOULD` is the default or strongly recommended behavior. `MAY` is optional behavior.
|
|
8
|
+
|
|
9
|
+
## 1. Problem Statement
|
|
10
|
+
|
|
11
|
+
This specification defines a standard architecture for building runners that repeatedly invoke agentic CLIs such as `agent`, `opencode`, and `claude` across many tasks. The purpose of the system is to make large prompt-driven workflows durable, inspectable, resumable, and safe enough to run unattended.
|
|
12
|
+
|
|
13
|
+
The hard problems in this domain are not prompt formatting alone. The hard problems are:
|
|
14
|
+
|
|
15
|
+
- durable state and resume behavior after interruptions
|
|
16
|
+
- deterministic parsing of machine-readable results from model output
|
|
17
|
+
- external verification owned by the runner instead of the model
|
|
18
|
+
- bounded recovery after fixable failures
|
|
19
|
+
- portability across multiple CLIs without rewriting the orchestrator
|
|
20
|
+
|
|
21
|
+
Prior v1 implementations diverged in three main areas:
|
|
22
|
+
|
|
23
|
+
- healing schedule: per-task, fixed batch, or epoch-based
|
|
24
|
+
- parser strategy: regex/text extraction versus structured contracts
|
|
25
|
+
- platform packaging: each IDE or tool surface redefining architecture in its own wrapper
|
|
26
|
+
|
|
27
|
+
This v2 specification replaces that divergence with one canonical design. The system defined here is intended for:
|
|
28
|
+
|
|
29
|
+
- batch code edits
|
|
30
|
+
- audit and evaluation runs
|
|
31
|
+
- stage-based workflows
|
|
32
|
+
- resumable overnight runs
|
|
33
|
+
- bounded self-healing after fixable failures
|
|
34
|
+
|
|
35
|
+
This v2 specification is not trying to solve:
|
|
36
|
+
|
|
37
|
+
- general-purpose multi-agent planning frameworks
|
|
38
|
+
- autonomous source-code editing by the healer
|
|
39
|
+
- unbounded retry loops
|
|
40
|
+
- platform-specific UX beyond thin wrappers
|
|
41
|
+
|
|
42
|
+
## 2. Goals and Non-Goals
|
|
43
|
+
|
|
44
|
+
### Goals
|
|
45
|
+
|
|
46
|
+
- Define one vocabulary, one runtime model, and one contract stack for all implementations.
|
|
47
|
+
- Make the orchestrator the single source of truth for task status, verification, checkpointing, and healing policy.
|
|
48
|
+
- Standardize worker and healer output as schema-validated JSON contracts.
|
|
49
|
+
- Preserve CLI portability through adapters rather than forking orchestrator logic.
|
|
50
|
+
- Define a default healing model that starts conservative, expands when stable, and stops when automation is no longer justified.
|
|
51
|
+
- Define enough detail that a peer can implement the system without prior knowledge of v1 variants or this repository.
|
|
52
|
+
|
|
53
|
+
### Non-Goals
|
|
54
|
+
|
|
55
|
+
- This document does not prescribe project-specific build, test, or browser commands.
|
|
56
|
+
- This document does not require a specific product repository structure beyond the files needed by the runner.
|
|
57
|
+
- This document does not require a specific agent model or provider.
|
|
58
|
+
- This document does not define a UI or dashboard for monitoring runs.
|
|
59
|
+
|
|
60
|
+
### Assumptions
|
|
61
|
+
|
|
62
|
+
- Readers are technical peers evaluating architecture, not end users.
|
|
63
|
+
- The document is a normative v2 design document, not a brainstorm or rough proposal.
|
|
64
|
+
- Platform wrappers are packaging concerns and are not allowed to redefine architecture.
|
|
65
|
+
- The reference implementation is expected to run TypeScript via global `tsx`.
|
|
66
|
+
|
|
67
|
+
## 3. Glossary
|
|
68
|
+
|
|
69
|
+
| Term | Definition |
|
|
70
|
+
| --- | --- |
|
|
71
|
+
| `Task` | The smallest unit of work the runner schedules, executes, verifies, and tracks. |
|
|
72
|
+
| `Manifest` | The source of truth for the set of tasks and their metadata. |
|
|
73
|
+
| `Shared Context` | Reusable prompt material applied to multiple tasks, such as operating constraints, style rules, or output contract reminders. |
|
|
74
|
+
| `Worker` | The model or CLI invocation that performs the actual task work. |
|
|
75
|
+
| `Healer` | The model or CLI invocation that analyzes fixable failures and emits allowed patches to prompts, shared context, or bounded runtime knobs. |
|
|
76
|
+
| `Orchestrator` | The deterministic runtime that owns scheduling, parsing, verification, checkpointing, healing, and retry policy. |
|
|
77
|
+
| `Adapter` | The CLI-specific execution layer used by the orchestrator to invoke a tool without embedding tool-specific behavior in the core runtime. |
|
|
78
|
+
| `Verification Gate` | Any external check run by the orchestrator after worker output is parsed, such as build, test, lint, smoke, or browser validation. |
|
|
79
|
+
| `Failure Class` | The normalized reason category assigned to a failed task. |
|
|
80
|
+
| `Failure Signature` | The stable, comparable fingerprint used to detect repeated failures across tasks or retries. |
|
|
81
|
+
| `Batch` | The current window of ready tasks processed before the orchestrator evaluates whether healing is needed. |
|
|
82
|
+
| `Epoch` | One full sweep over all currently pending tasks. |
|
|
83
|
+
| `PBH` | Progressive Batch Healing, the default healing strategy that adjusts batch size based on observed stability. |
|
|
84
|
+
| `Convergence` | Evidence that healing is reducing failures rather than repeating them. |
|
|
85
|
+
| `Escalation` | A terminal outcome where the system stops retrying a task or run because further automated healing is not justified. |
|
|
86
|
+
|
|
87
|
+
## 4. System Overview
|
|
88
|
+
|
|
89
|
+
The canonical v2 system has five moving parts:
|
|
90
|
+
|
|
91
|
+
- a manifest that declares work
|
|
92
|
+
- an orchestrator that owns truth
|
|
93
|
+
- adapters that invoke specific CLIs
|
|
94
|
+
- a worker that proposes task results
|
|
95
|
+
- a healer that proposes bounded recovery patches
|
|
96
|
+
|
|
97
|
+
At a high level, the system operates like this:
|
|
98
|
+
|
|
99
|
+
```text
|
|
100
|
+
Manifest
|
|
101
|
+
-> Orchestrator
|
|
102
|
+
-> Adapter
|
|
103
|
+
-> Worker
|
|
104
|
+
-> Parser and Schema Validator
|
|
105
|
+
-> Verification Gates
|
|
106
|
+
-> State Checkpoint
|
|
107
|
+
-> Healer Checkpoint
|
|
108
|
+
-> Adapter
|
|
109
|
+
-> Healer
|
|
110
|
+
-> Patch Validation and Application
|
|
111
|
+
-> Resume or Escalate
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
The orchestrator owns truth at every stage. Worker and healer outputs are only candidate data until the orchestrator validates them and commits them to state.
|
|
115
|
+
|
|
116
|
+
### Canonical Source Tree
|
|
117
|
+
|
|
118
|
+
The canonical source of truth SHOULD be organized like this:
|
|
119
|
+
|
|
120
|
+
```text
|
|
121
|
+
agent-threader/
|
|
122
|
+
SKILL.md
|
|
123
|
+
SPEC.md
|
|
124
|
+
schemas/
|
|
125
|
+
templates/
|
|
126
|
+
platforms/
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
The purpose of each top-level artifact is:
|
|
130
|
+
|
|
131
|
+
- `SKILL.md`: short entrypoint describing when to use the skill and where the normative specification lives
|
|
132
|
+
- `SPEC.md`: the normative architecture document
|
|
133
|
+
- `schemas/`: JSON schemas for manifest, worker result, healer decision, and state
|
|
134
|
+
- `templates/`: reference runtime, parser, and adapter skeletons
|
|
135
|
+
- `platforms/`: thin wrappers for `cursor`, `codex`, `claude`, and `windsurf`
|
|
136
|
+
|
|
137
|
+
Platform wrappers MUST NOT define new architectural behavior. They MAY describe invocation syntax, UX wording, or tool-specific setup.
|
|
138
|
+
|
|
139
|
+
### Default Configuration
|
|
140
|
+
|
|
141
|
+
| Setting | Default |
|
|
142
|
+
| --- | --- |
|
|
143
|
+
| Reference runtime | TypeScript via global `tsx` |
|
|
144
|
+
| Contract format | Fenced JSON only |
|
|
145
|
+
| Default healing schedule | `auto` |
|
|
146
|
+
| Default healing strategy | `PBH` |
|
|
147
|
+
| Batch growth strategy | `fibonacci` |
|
|
148
|
+
| Manual batch size default | `5` |
|
|
149
|
+
| Failure threshold | `0.2` |
|
|
150
|
+
| Max worker attempts per task | `2` |
|
|
151
|
+
| Max heal rounds per window | `2` |
|
|
152
|
+
| Max total heal rounds | `8` |
|
|
153
|
+
| Signature repeat limit | `2` |
|
|
154
|
+
| Verification owner | Orchestrator |
|
|
155
|
+
| Parser authority | Schema-validated parser modules only |
|
|
156
|
+
|
|
157
|
+
## 5. Canonical Runtime Model
|
|
158
|
+
|
|
159
|
+
### Runtime Choice
|
|
160
|
+
|
|
161
|
+
The reference implementation SHOULD be a typed TypeScript orchestrator executed via global `tsx`.
|
|
162
|
+
|
|
163
|
+
This choice means:
|
|
164
|
+
|
|
165
|
+
- `tsx` is assumed to be globally available in environments using the reference implementation
|
|
166
|
+
- shell and expect remain valid as adapter implementations, not as the canonical orchestration core
|
|
167
|
+
- Python is no longer the normative parser runtime
|
|
168
|
+
- conforming orchestrators MAY be implemented in other languages if they preserve the same contracts, state transitions, parser guarantees, and healing behavior
|
|
169
|
+
|
|
170
|
+
### Stable Ordering Rules
|
|
171
|
+
|
|
172
|
+
The orchestrator MUST build a stable execution order:
|
|
173
|
+
|
|
174
|
+
- Tasks MUST be topologically sorted by `depends_on`.
|
|
175
|
+
- Lower `priority` values SHOULD run before higher `priority` values.
|
|
176
|
+
- If dependency depth and priority are equal, manifest order MUST be preserved.
|
|
177
|
+
- A task MUST NOT start until all of its dependencies are `DONE`.
|
|
178
|
+
|
|
179
|
+
### Concurrency and Window Semantics
|
|
180
|
+
|
|
181
|
+
In this specification, `concurrency` and `parallelism` mean the same thing: two or more worker processes executing at the same time and consuming real runtime resources such as CPU threads, process slots, or provider capacity.
|
|
182
|
+
|
|
183
|
+
The orchestrator MUST support these rules:
|
|
184
|
+
|
|
185
|
+
- Default `concurrency` is `1`.
|
|
186
|
+
- A window is the scheduling set currently being attempted before a healing checkpoint.
|
|
187
|
+
- The effective attempted window size is `min(current_batch_size, count of ready tasks in the current scheduling slice)`.
|
|
188
|
+
- Tasks within a window MAY run sequentially or in parallel.
|
|
189
|
+
- If `concurrency > 1`, tasks in the same window MAY execute simultaneously, but each task MUST still respect `depends_on`.
|
|
190
|
+
- A window is complete only when every runnable task assigned to that window has settled into one of: `DONE`, `BLOCKED`, `FAILED`, or `ESCALATED` for that attempt.
|
|
191
|
+
- Healing MUST NOT trigger mid-window.
|
|
192
|
+
- Every concurrently executed task MUST write to its own worker log and verify log paths.
|
|
193
|
+
- State updates from concurrent completions MUST still preserve atomic checkpoint semantics.
|
|
194
|
+
|
|
195
|
+
### End-to-End Control Flow
|
|
196
|
+
|
|
197
|
+
The orchestrator MUST implement this control flow:
|
|
198
|
+
|
|
199
|
+
1. Load the manifest.
|
|
200
|
+
2. Load or initialize state.
|
|
201
|
+
3. Validate the manifest against `manifest.v2`.
|
|
202
|
+
4. Build the dependency-resolved pending queue.
|
|
203
|
+
5. Run worker tasks through adapters.
|
|
204
|
+
6. Parse fenced result JSON from worker output.
|
|
205
|
+
7. Run verification gates.
|
|
206
|
+
8. Classify task outcome and generate a failure signature if needed.
|
|
207
|
+
9. Checkpoint state atomically.
|
|
208
|
+
10. Invoke the healer at batch checkpoints when policy says healing is needed.
|
|
209
|
+
11. Apply validated allowed patches.
|
|
210
|
+
12. Reset retryable tasks and continue.
|
|
211
|
+
13. Stop on completion, bounded non-convergence, or unrecoverable escalation.
|
|
212
|
+
|
|
213
|
+
### Core Runtime Rules
|
|
214
|
+
|
|
215
|
+
- The orchestrator MUST capture combined stdout and stderr for every worker and healer invocation.
|
|
216
|
+
- Exit code alone MUST NOT be treated as task success.
|
|
217
|
+
- The orchestrator MUST parse and validate worker and healer contracts before any state mutation.
|
|
218
|
+
- The orchestrator MUST run verification after parse succeeds and before a task can become `DONE`.
|
|
219
|
+
- The orchestrator MUST checkpoint state after every task attempt and every healing round.
|
|
220
|
+
- The orchestrator MUST treat direct model prose outside the fenced contracts as non-authoritative.
|
|
221
|
+
- The orchestrator MUST use non-destructive logging: write full logs first, then inspect or parse them.
|
|
222
|
+
- The orchestrator MUST trap shutdown signals, terminate child processes it started, and leave state in a resumable condition.
|
|
223
|
+
|
|
224
|
+
### Shared Skill Utilities
|
|
225
|
+
|
|
226
|
+
Parser, validation, hashing, rollback, and state utility functions SHOULD exist as shared skill utilities rather than being reimplemented inside each adapter.
|
|
227
|
+
|
|
228
|
+
Adapters MAY expose convenience helpers, but contract extraction and schema validation MUST resolve to shared parser and validator utilities so all adapters produce identical acceptance and failure behavior.
|
|
229
|
+
|
|
230
|
+
### Parser Error Handling
|
|
231
|
+
|
|
232
|
+
The parser layer MUST return deterministic error classes. At minimum, it MUST support:
|
|
233
|
+
|
|
234
|
+
- `NO_SENTINEL`
|
|
235
|
+
- `INVALID_JSON`
|
|
236
|
+
- `SCHEMA_VIOLATION`
|
|
237
|
+
- `MISSING_REQUIRED_FIELD`
|
|
238
|
+
- `UNSUPPORTED_VERSION`
|
|
239
|
+
|
|
240
|
+
Parser errors SHOULD be converted into normalized failure classes and signatures by the orchestrator.
|
|
241
|
+
|
|
242
|
+
Before returning `INVALID_JSON`, the shared parser utility SHOULD attempt a conservative repair pass limited to:
|
|
243
|
+
|
|
244
|
+
- stripping outer markdown fences
|
|
245
|
+
- removing trailing commas
|
|
246
|
+
- removing JavaScript-style comments
|
|
247
|
+
|
|
248
|
+
If repair still fails, the task MUST be treated as a contract error.
|
|
249
|
+
|
|
250
|
+
## 6. Healing Model (`PBH`)
|
|
251
|
+
|
|
252
|
+
### Scheduling Modes
|
|
253
|
+
|
|
254
|
+
The canonical schedule enum MUST be:
|
|
255
|
+
|
|
256
|
+
- `auto`
|
|
257
|
+
- `off`
|
|
258
|
+
- `task`
|
|
259
|
+
- `batch`
|
|
260
|
+
- `epoch`
|
|
261
|
+
|
|
262
|
+
The meaning of each mode is:
|
|
263
|
+
|
|
264
|
+
| Mode | Meaning |
|
|
265
|
+
| --- | --- |
|
|
266
|
+
| `auto` | Use Progressive Batch Healing with adaptive growth and shrink behavior. |
|
|
267
|
+
| `off` | Disable healing entirely. |
|
|
268
|
+
| `task` | Heal only the single failed task being retried. Effective window size is always `1`. |
|
|
269
|
+
| `batch` | Heal at fixed batch checkpoints using a fixed `batch_size`. Default fixed batch size is `5`. |
|
|
270
|
+
| `epoch` | Attempt all currently pending tasks before healing. Effective window size is all pending tasks in the epoch. |
|
|
271
|
+
|
|
272
|
+
`auto` MUST be the default and MUST be the only mode that uses progressive growth and shrink behavior.
|
|
273
|
+
|
|
274
|
+
### PBH Definition
|
|
275
|
+
|
|
276
|
+
Progressive Batch Healing (`PBH`) is the default healing strategy. Under PBH:
|
|
277
|
+
|
|
278
|
+
- the orchestrator starts with a small healing window
|
|
279
|
+
- it increases batch size only after successful or stable windows
|
|
280
|
+
- it reduces batch size when failures imply systemic instability
|
|
281
|
+
- it retries only after a validated healer patch set is applied
|
|
282
|
+
|
|
283
|
+
### PBH Defaults
|
|
284
|
+
|
|
285
|
+
The default PBH policy MUST be:
|
|
286
|
+
|
|
287
|
+
- `heal.schedule = auto`
|
|
288
|
+
- `batch.strategy = fibonacci`
|
|
289
|
+
- fibonacci sequence = `1, 2, 3, 5, 8, 13, ...`
|
|
290
|
+
- `failure_threshold = 0.2`
|
|
291
|
+
- `max_worker_attempts_per_task = 2`
|
|
292
|
+
- `max_heal_rounds_per_window = 2`
|
|
293
|
+
- `max_total_heal_rounds = 8`
|
|
294
|
+
- `signature_repeat_limit = 2`
|
|
295
|
+
|
|
296
|
+
### Failure Rate
|
|
297
|
+
|
|
298
|
+
For PBH, `failure_rate` MUST be defined as:
|
|
299
|
+
|
|
300
|
+
```text
|
|
301
|
+
failure_rate = healable_failed_tasks_in_window / attempted_healable_tasks_in_window
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
`healable_failed_tasks_in_window` includes only tasks in the current window that:
|
|
305
|
+
|
|
306
|
+
- were attempted in the current window, and
|
|
307
|
+
- ended the attempt in a non-`DONE` state, and
|
|
308
|
+
- are currently classified by the orchestrator as healable
|
|
309
|
+
|
|
310
|
+
`BLOCKED` tasks and non-healable failures MUST NOT count toward the PBH failure-rate numerator. They still count for reporting and MAY cause escalation, but they do not consume heal budget by themselves.
|
|
311
|
+
|
|
312
|
+
If `attempted_healable_tasks_in_window == 0`, then:
|
|
313
|
+
|
|
314
|
+
- `failure_rate` is treated as `0`
|
|
315
|
+
- the orchestrator MUST skip healer invocation for that window
|
|
316
|
+
- the window MAY still produce escalations for blocked or non-healable failures
|
|
317
|
+
|
|
318
|
+
### PBH Behavior
|
|
319
|
+
|
|
320
|
+
The orchestrator MUST implement the following behavior in `auto` mode:
|
|
321
|
+
|
|
322
|
+
- If a window finishes with zero failures, move to the next larger batch size in the configured sequence.
|
|
323
|
+
- If failure rate is greater than `0` but less than or equal to `failure_threshold`, run the healer once and retry the same window.
|
|
324
|
+
- If failure rate is above `failure_threshold`, shrink one batch level and isolate repeated signatures.
|
|
325
|
+
- If the same task repeats the same failure signature after allowed healing, escalate that task.
|
|
326
|
+
- If healing rounds stop reducing total failing tasks or signature diversity, abort the run and record the non-convergence reason in the run summary and state.
|
|
327
|
+
|
|
328
|
+
If PBH fails to heal a run, the run MUST be aborted rather than looping indefinitely. The abort record MUST include a human-readable reason, such as:
|
|
329
|
+
|
|
330
|
+
- repeated same failure signatures after allowed retries
|
|
331
|
+
- no reduction in failing task count across heal rounds
|
|
332
|
+
- total healing budget exhausted
|
|
333
|
+
- current window contains only non-healable outcomes
|
|
334
|
+
|
|
335
|
+
### Healable Versus Non-Healable Failures
|
|
336
|
+
|
|
337
|
+
The orchestrator MUST classify each failure as `healable` or `non_healable`.
|
|
338
|
+
|
|
339
|
+
The default healable set SHOULD include:
|
|
340
|
+
|
|
341
|
+
- `prompt_gap`
|
|
342
|
+
- `missing_paths`
|
|
343
|
+
- `weak_contract`
|
|
344
|
+
- `contract_error`
|
|
345
|
+
- `output_format`
|
|
346
|
+
- `timeout`
|
|
347
|
+
- `transient_infra`
|
|
348
|
+
|
|
349
|
+
The default non-healable set SHOULD include:
|
|
350
|
+
|
|
351
|
+
- `blocked_external`
|
|
352
|
+
- `real_bug`
|
|
353
|
+
|
|
354
|
+
`build_error`, `test_error`, and `smoke_error` MAY be treated as healable when evidence points to prompt, context, or runtime configuration rather than a genuine product defect.
|
|
355
|
+
|
|
356
|
+
Tasks that fail with parser-layer contract errors SHOULD receive one automatic contract-format retry before consuming normal task retry or heal budget. This retry SHOULD append a strict formatting reminder to the next worker prompt and MUST NOT invoke the healer.
|
|
357
|
+
|
|
358
|
+
### Healer Authority Under Guardrails
|
|
359
|
+
|
|
360
|
+
The healer MAY emit bounded runtime patches, but only under guardrails enforced by the orchestrator.
|
|
361
|
+
|
|
362
|
+
Allowed runtime keys are:
|
|
363
|
+
|
|
364
|
+
- `timeout_sec`
|
|
365
|
+
- `concurrency`
|
|
366
|
+
- `current_batch_size`
|
|
367
|
+
|
|
368
|
+
The healer MUST NOT modify:
|
|
369
|
+
|
|
370
|
+
- `heal.schedule`
|
|
371
|
+
- `batch.strategy`
|
|
372
|
+
- verification commands
|
|
373
|
+
- protected-file rules
|
|
374
|
+
- parser behavior
|
|
375
|
+
- model provider or model identity
|
|
376
|
+
|
|
377
|
+
The orchestrator MUST validate runtime patches against operator-defined limits before applying them.
|
|
378
|
+
|
|
379
|
+
## 7. Public Interfaces and Schemas
|
|
380
|
+
|
|
381
|
+
All public contracts MUST be JSON, versioned, and schema-validated.
|
|
382
|
+
|
|
383
|
+
### `manifest.v2`
|
|
384
|
+
|
|
385
|
+
#### Required Top-Level Fields
|
|
386
|
+
|
|
387
|
+
| Field | Type | Meaning |
|
|
388
|
+
| --- | --- | --- |
|
|
389
|
+
| `manifest_version` | string | Contract version. MUST be `"2.0"`. |
|
|
390
|
+
| `run_id` | string | Logical run identifier. |
|
|
391
|
+
| `tasks` | array | Ordered list of task definitions. |
|
|
392
|
+
|
|
393
|
+
#### Required Task Fields
|
|
394
|
+
|
|
395
|
+
| Field | Type | Meaning |
|
|
396
|
+
| --- | --- | --- |
|
|
397
|
+
| `id` | string | Stable, unique task identifier. |
|
|
398
|
+
| `prompt_ref` | string | Relative path or logical reference to the task prompt. |
|
|
399
|
+
| `depends_on` | array of strings | Upstream task IDs that must be `DONE` before execution. |
|
|
400
|
+
| `timeout_sec` | number | Task timeout in seconds. |
|
|
401
|
+
| `verify_profile` | string | Name of the project-defined verification profile. |
|
|
402
|
+
|
|
403
|
+
#### Optional Task Fields
|
|
404
|
+
|
|
405
|
+
| Field | Type | Meaning |
|
|
406
|
+
| --- | --- | --- |
|
|
407
|
+
| `context_refs` | array of strings | Shared context references applied to the task. |
|
|
408
|
+
| `priority` | number | Lower number means earlier scheduling within the same dependency depth. |
|
|
409
|
+
| `retry_policy` | object | Task-specific retry constraints. |
|
|
410
|
+
| `metadata` | object | Arbitrary task metadata for reporting or filtering. |
|
|
411
|
+
|
|
412
|
+
`metadata` keys SHOULD remain flat unless nested structure is required for interoperability with an external system.
|
|
413
|
+
|
|
414
|
+
#### `retry_policy` Shape
|
|
415
|
+
|
|
416
|
+
| Field | Type | Meaning |
|
|
417
|
+
| --- | --- | --- |
|
|
418
|
+
| `max_attempts` | number | Maximum worker attempts for this task. Defaults to global policy if omitted. |
|
|
419
|
+
| `retry_on` | array of strings | Failure classes eligible for retry. |
|
|
420
|
+
|
|
421
|
+
### `verify_profile` Registry
|
|
422
|
+
|
|
423
|
+
`verify_profile` is a manifest reference to an operator-defined verification profile. The profile registry is outside the worker contract and MUST be resolved by the orchestrator from project configuration.
|
|
424
|
+
|
|
425
|
+
The canonical schema for this operator-owned registry is `schemas/verify_profile.v2.json`.
|
|
426
|
+
|
|
427
|
+
The minimum logical shape of a profile registry is:
|
|
428
|
+
|
|
429
|
+
```json
|
|
430
|
+
{
|
|
431
|
+
"profiles": {
|
|
432
|
+
"build_and_test": {
|
|
433
|
+
"steps": [
|
|
434
|
+
{
|
|
435
|
+
"name": "build",
|
|
436
|
+
"cmd": "pnpm build",
|
|
437
|
+
"cwd": ".",
|
|
438
|
+
"timeout_sec": 300
|
|
439
|
+
},
|
|
440
|
+
{
|
|
441
|
+
"name": "test",
|
|
442
|
+
"cmd": "pnpm test",
|
|
443
|
+
"cwd": ".",
|
|
444
|
+
"timeout_sec": 600
|
|
445
|
+
}
|
|
446
|
+
],
|
|
447
|
+
"rollback_on_failure": true
|
|
448
|
+
}
|
|
449
|
+
}
|
|
450
|
+
}
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
The orchestrator MAY load this registry from any project-defined path, but the registry format SHOULD be documented wherever the runner is packaged and SHOULD validate against `verify_profile.v2`.
|
|
454
|
+
|
|
455
|
+
#### Example
|
|
456
|
+
|
|
457
|
+
```json
|
|
458
|
+
{
|
|
459
|
+
"manifest_version": "2.0",
|
|
460
|
+
"run_id": "run-20260320-001",
|
|
461
|
+
"tasks": [
|
|
462
|
+
{
|
|
463
|
+
"id": "WP-017",
|
|
464
|
+
"prompt_ref": "prompts/WP-017.md",
|
|
465
|
+
"context_refs": ["_shared-context.md"],
|
|
466
|
+
"depends_on": [],
|
|
467
|
+
"priority": 1,
|
|
468
|
+
"timeout_sec": 900,
|
|
469
|
+
"verify_profile": "build_and_test",
|
|
470
|
+
"retry_policy": {
|
|
471
|
+
"max_attempts": 2,
|
|
472
|
+
"retry_on": ["prompt_gap", "timeout", "transient_infra"]
|
|
473
|
+
},
|
|
474
|
+
"metadata": {
|
|
475
|
+
"component": "button"
|
|
476
|
+
}
|
|
477
|
+
}
|
|
478
|
+
]
|
|
479
|
+
}
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
### `task_result.v2`
|
|
483
|
+
|
|
484
|
+
The worker MUST emit exactly one fenced JSON block:
|
|
485
|
+
|
|
486
|
+
```text
|
|
487
|
+
<<<TASK_RESULT_V2>>>
|
|
488
|
+
{ ...json... }
|
|
489
|
+
<<<END_TASK_RESULT_V2>>>
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
#### Required Fields
|
|
493
|
+
|
|
494
|
+
| Field | Type | Meaning |
|
|
495
|
+
| --- | --- | --- |
|
|
496
|
+
| `contract_version` | string | MUST be `"2.0"`. |
|
|
497
|
+
| `task_id` | string | Task ID matching the current manifest task. |
|
|
498
|
+
| `status` | string | One of `DONE`, `BLOCKED`, `FAILED`, `CONTRACT_ERROR`. |
|
|
499
|
+
| `summary` | string | Short human-readable summary of what happened. |
|
|
500
|
+
|
|
501
|
+
#### Optional Fields
|
|
502
|
+
|
|
503
|
+
| Field | Type | Meaning |
|
|
504
|
+
| --- | --- | --- |
|
|
505
|
+
| `changed_files` | array of strings | Relative file paths changed by the proposed work. |
|
|
506
|
+
| `writes` | array | Proposed file write operations applied by the orchestrator. |
|
|
507
|
+
| `evidence` | object | Commands, log references, or notes supplied by the worker. |
|
|
508
|
+
| `failure_class` | string | Optional worker-supplied hint. The orchestrator still owns final classification. |
|
|
509
|
+
|
|
510
|
+
#### `writes[]` Shape
|
|
511
|
+
|
|
512
|
+
| Field | Type | Meaning |
|
|
513
|
+
| --- | --- | --- |
|
|
514
|
+
| `path` | string | Relative normalized path. MUST NOT escape the workspace root. |
|
|
515
|
+
| `op` | string | One of `create`, `replace`, `append`. |
|
|
516
|
+
| `encoding` | string | MUST be `"utf8"` for the reference implementation. |
|
|
517
|
+
| `content` | string | Inline file content to be applied by the orchestrator. |
|
|
518
|
+
| `content_ref` | string | Optional path to staged content written by the worker tooling instead of inline content. |
|
|
519
|
+
| `sha256_before` | string | Optional precondition hash for conflict detection. |
|
|
520
|
+
|
|
521
|
+
At least one of `content` or `content_ref` MUST be present for each write entry.
|
|
522
|
+
|
|
523
|
+
The orchestrator SHOULD prefer inline `content` for small and medium files. `content_ref` MAY be used when the worker environment can stage large content more reliably than JSON escaping.
|
|
524
|
+
|
|
525
|
+
#### `evidence` Shape
|
|
526
|
+
|
|
527
|
+
| Field | Type | Meaning |
|
|
528
|
+
| --- | --- | --- |
|
|
529
|
+
| `commands` | array of strings | Commands the worker claims to have run. |
|
|
530
|
+
| `log_refs` | array of strings | Relative log references produced by the worker. |
|
|
531
|
+
| `notes` | array of strings | Additional structured evidence notes. |
|
|
532
|
+
|
|
533
|
+
#### Example
|
|
534
|
+
|
|
535
|
+
```json
|
|
536
|
+
{
|
|
537
|
+
"contract_version": "2.0",
|
|
538
|
+
"task_id": "WP-017",
|
|
539
|
+
"status": "DONE",
|
|
540
|
+
"summary": "Implemented focus-visible fix and updated tests.",
|
|
541
|
+
"changed_files": [
|
|
542
|
+
"packages/ui/button.tsx",
|
|
543
|
+
"packages/ui/button.test.ts"
|
|
544
|
+
],
|
|
545
|
+
"writes": [
|
|
546
|
+
{
|
|
547
|
+
"path": "packages/ui/button.tsx",
|
|
548
|
+
"op": "replace",
|
|
549
|
+
"encoding": "utf8",
|
|
550
|
+
"content": "export function Button() {}",
|
|
551
|
+
"sha256_before": "sha256:example"
|
|
552
|
+
}
|
|
553
|
+
],
|
|
554
|
+
"evidence": {
|
|
555
|
+
"commands": [
|
|
556
|
+
"pnpm --filter sample-site test:filter button"
|
|
557
|
+
],
|
|
558
|
+
"log_refs": [
|
|
559
|
+
"logs/WP-017.verify.log"
|
|
560
|
+
]
|
|
561
|
+
}
|
|
562
|
+
}
|
|
563
|
+
```
|
|
564
|
+
|
|
565
|
+
### `heal_decision.v2`
|
|
566
|
+
|
|
567
|
+
The healer MUST emit exactly one fenced JSON block:
|
|
568
|
+
|
|
569
|
+
```text
|
|
570
|
+
<<<HEAL_DECISION_V2>>>
|
|
571
|
+
{ ...json... }
|
|
572
|
+
<<<END_HEAL_DECISION_V2>>>
|
|
573
|
+
```
|
|
574
|
+
|
|
575
|
+
#### Required Fields
|
|
576
|
+
|
|
577
|
+
| Field | Type | Meaning |
|
|
578
|
+
| --- | --- | --- |
|
|
579
|
+
| `contract_version` | string | MUST be `"2.0"`. |
|
|
580
|
+
| `scope` | string | Advisory healer view of the current healing level. One of `task`, `batch`, `epoch`. |
|
|
581
|
+
| `decision` | string | One of `RETRY`, `ESCALATE`, `NOT_FIXABLE`. |
|
|
582
|
+
| `failure_class` | string | Normalized failure class the healer is addressing. |
|
|
583
|
+
| `root_cause` | string | One-sentence diagnosis of the repeated issue. |
|
|
584
|
+
| `patches` | array | Allowed patch operations. |
|
|
585
|
+
|
|
586
|
+
#### Optional Fields
|
|
587
|
+
|
|
588
|
+
| Field | Type | Meaning |
|
|
589
|
+
| --- | --- | --- |
|
|
590
|
+
| `learned_rule` | string | Reusable rule recorded by the orchestrator for future runs. |
|
|
591
|
+
| `escalations` | array | Explicit per-task escalation records. |
|
|
592
|
+
| `retry_policy` | object | Optional retry/reset directives for the orchestrator. |
|
|
593
|
+
|
|
594
|
+
#### `patches[]` Shape
|
|
595
|
+
|
|
596
|
+
| Field | Type | Meaning |
|
|
597
|
+
| --- | --- | --- |
|
|
598
|
+
| `target` | string | One of `shared_context`, `task_prompt`, `runtime_patch`, `contract_hint`. |
|
|
599
|
+
| `operation` | string | One of `replace`, `append`, `merge`. |
|
|
600
|
+
| `path` | string | Required for `shared_context` and task prompt file replacements. |
|
|
601
|
+
| `task_id` | string | Required when target is `task_prompt`. |
|
|
602
|
+
| `content` | string or object | Patch payload. String for text replacements, object for runtime merge content. |
|
|
603
|
+
|
|
604
|
+
`scope` is informational and MAY be recorded for diagnostics, but the orchestrator MUST derive actual patch applicability from:
|
|
605
|
+
|
|
606
|
+
- the active healing schedule
|
|
607
|
+
- the current window membership
|
|
608
|
+
- the patch targets present in `patches[]`
|
|
609
|
+
|
|
610
|
+
The orchestrator MUST NOT grant additional authority solely because the healer labeled a decision as `epoch` or `batch`.
|
|
611
|
+
|
|
612
|
+
`contract_hint` means non-authoritative text merged into future prompt assembly. It is not a file write by itself. The orchestrator MUST apply `contract_hint` like this:
|
|
613
|
+
|
|
614
|
+
- current healing scope means the set of task IDs included in the healer input bundle for the current invocation
|
|
615
|
+
- if `task_id` is present, append the hint to the next assembled worker prompt for that task only
|
|
616
|
+
- if `task_id` is absent, append the hint to the next assembled prompt for every task in the current healing scope
|
|
617
|
+
- `contract_hint` MUST NOT be written to disk unless another patch explicitly writes a file
|
|
618
|
+
|
|
619
|
+
#### `retry_policy` Shape
|
|
620
|
+
|
|
621
|
+
| Field | Type | Meaning |
|
|
622
|
+
| --- | --- | --- |
|
|
623
|
+
| `reset_tasks` | array of strings | Tasks to reset to pending for retry. |
|
|
624
|
+
| `retry_window` | string | One of `same_window`, `shrink_window`, `next_epoch`. |
|
|
625
|
+
|
|
626
|
+
#### Example
|
|
627
|
+
|
|
628
|
+
```json
|
|
629
|
+
{
|
|
630
|
+
"contract_version": "2.0",
|
|
631
|
+
"scope": "batch",
|
|
632
|
+
"decision": "RETRY",
|
|
633
|
+
"failure_class": "prompt_gap",
|
|
634
|
+
"root_cause": "Shared context omitted the import convention needed by multiple tasks.",
|
|
635
|
+
"patches": [
|
|
636
|
+
{
|
|
637
|
+
"target": "shared_context",
|
|
638
|
+
"operation": "append",
|
|
639
|
+
"path": "_shared-context.md",
|
|
640
|
+
"content": "Always include the cn() import rule."
|
|
641
|
+
},
|
|
642
|
+
{
|
|
643
|
+
"target": "task_prompt",
|
|
644
|
+
"operation": "replace",
|
|
645
|
+
"task_id": "WP-017",
|
|
646
|
+
"path": "prompts/WP-017.md",
|
|
647
|
+
"content": "Use the shared import convention and emit TASK_RESULT_V2."
|
|
648
|
+
},
|
|
649
|
+
{
|
|
650
|
+
"target": "runtime_patch",
|
|
651
|
+
"operation": "merge",
|
|
652
|
+
"content": {
|
|
653
|
+
"timeout_sec": 1200,
|
|
654
|
+
"current_batch_size": 2
|
|
655
|
+
}
|
|
656
|
+
},
|
|
657
|
+
{
|
|
658
|
+
"target": "contract_hint",
|
|
659
|
+
"operation": "append",
|
|
660
|
+
"task_id": "WP-017",
|
|
661
|
+
"content": "Return exactly one TASK_RESULT_V2 block at end of output."
|
|
662
|
+
}
|
|
663
|
+
],
|
|
664
|
+
"learned_rule": "When GTS tasks fail in a group, patch shared context before retrying isolated prompts.",
|
|
665
|
+
"retry_policy": {
|
|
666
|
+
"reset_tasks": ["WP-017", "WP-018"],
|
|
667
|
+
"retry_window": "same_window"
|
|
668
|
+
},
|
|
669
|
+
"escalations": []
|
|
670
|
+
}
|
|
671
|
+
```
|
|
672
|
+
|
|
673
|
+
### `state.v2`
|
|
674
|
+
|
|
675
|
+
#### Required Top-Level Fields
|
|
676
|
+
|
|
677
|
+
| Field | Type | Meaning |
|
|
678
|
+
| --- | --- | --- |
|
|
679
|
+
| `state_version` | string | MUST be `"2.0"`. |
|
|
680
|
+
| `run_id` | string | Current run identifier. |
|
|
681
|
+
| `run_status` | string | One of `RUNNING`, `COMPLETED`, `ABORTED`. |
|
|
682
|
+
| `abort_reason` | string or null | Human-readable abort reason when `run_status` is `ABORTED`. |
|
|
683
|
+
| `manifest_digest` | string | Hash of the normalized manifest used for resume validation. |
|
|
684
|
+
| `policy` | object | Effective runtime policy for this run. |
|
|
685
|
+
| `tasks` | object | Per-task state keyed by task ID. |
|
|
686
|
+
| `healing_rounds` | array | Ordered record of healing checkpoints. |
|
|
687
|
+
|
|
688
|
+
#### Required `policy` Fields
|
|
689
|
+
|
|
690
|
+
| Field | Type | Meaning |
|
|
691
|
+
| --- | --- | --- |
|
|
692
|
+
| `heal_schedule` | string | Effective schedule mode. |
|
|
693
|
+
| `batch_strategy` | string | Usually `fibonacci` or `fixed`. |
|
|
694
|
+
| `current_batch_size` | number | Current effective window size. |
|
|
695
|
+
| `failure_threshold` | number | PBH threshold for the current run. |
|
|
696
|
+
| `max_worker_attempts_per_task` | number | Effective retry cap. |
|
|
697
|
+
| `max_heal_rounds_per_window` | number | Effective heal cap per window. |
|
|
698
|
+
| `max_total_heal_rounds` | number | Effective total heal budget. |
|
|
699
|
+
| `signature_repeat_limit` | number | Repeated signature escalation cap. |
|
|
700
|
+
|
|
701
|
+
#### Required Per-Task Fields
|
|
702
|
+
|
|
703
|
+
| Field | Type | Meaning |
|
|
704
|
+
| --- | --- | --- |
|
|
705
|
+
| `status` | string | One of `PENDING`, `RUNNING`, `DONE`, `BLOCKED`, `FAILED`, `ESCALATED`. |
|
|
706
|
+
| `worker_attempts` | number | Current worker attempt count. |
|
|
707
|
+
| `healer_attempts` | number | Current healer attempt count affecting the task. |
|
|
708
|
+
| `last_failure_class` | string or null | Most recent normalized failure class. |
|
|
709
|
+
| `last_failure_signature` | string or null | Most recent normalized failure signature. |
|
|
710
|
+
| `applied_patch_ids` | array of strings | Patch identifiers applied to this task or its shared context. |
|
|
711
|
+
| `history` | array | Attempt history records. |
|
|
712
|
+
|
|
713
|
+
#### Required History Fields
|
|
714
|
+
|
|
715
|
+
| Field | Type | Meaning |
|
|
716
|
+
| --- | --- | --- |
|
|
717
|
+
| `task_id` | string | Task ID for the record. |
|
|
718
|
+
| `phase` | string | One of `worker`, `verify`, `healer`, `rollback`. |
|
|
719
|
+
| `attempt_number` | number | Monotonic attempt number within the phase. |
|
|
720
|
+
| `log_path` | string | Relative path to the primary log. |
|
|
721
|
+
| `verify_log_path` | string or null | Relative path to verification log when applicable. |
|
|
722
|
+
| `exit_code` | number or null | Process exit code when applicable. |
|
|
723
|
+
| `failure_class` | string or null | Failure class for that attempt. |
|
|
724
|
+
| `failure_signature` | string or null | Failure signature for that attempt. |
|
|
725
|
+
| `applied_patch_ids` | array of strings | Patches active for that attempt. |
|
|
726
|
+
| `duration_sec` | number or null | Attempt duration in seconds when measurable. |
|
|
727
|
+
| `timestamp` | string | ISO-8601 timestamp. |
|
|
728
|
+
|
|
729
|
+
#### Example
|
|
730
|
+
|
|
731
|
+
```json
|
|
732
|
+
{
|
|
733
|
+
"state_version": "2.0",
|
|
734
|
+
"run_id": "run-20260320-001",
|
|
735
|
+
"run_status": "RUNNING",
|
|
736
|
+
"abort_reason": null,
|
|
737
|
+
"manifest_digest": "sha256:example",
|
|
738
|
+
"policy": {
|
|
739
|
+
"heal_schedule": "auto",
|
|
740
|
+
"batch_strategy": "fibonacci",
|
|
741
|
+
"current_batch_size": 2,
|
|
742
|
+
"failure_threshold": 0.2,
|
|
743
|
+
"max_worker_attempts_per_task": 2,
|
|
744
|
+
"max_heal_rounds_per_window": 2,
|
|
745
|
+
"max_total_heal_rounds": 8,
|
|
746
|
+
"signature_repeat_limit": 2
|
|
747
|
+
},
|
|
748
|
+
"tasks": {
|
|
749
|
+
"WP-017": {
|
|
750
|
+
"status": "FAILED",
|
|
751
|
+
"worker_attempts": 1,
|
|
752
|
+
"healer_attempts": 1,
|
|
753
|
+
"last_failure_class": "build_error",
|
|
754
|
+
"last_failure_signature": "build_error:missing-cn-import",
|
|
755
|
+
"applied_patch_ids": ["patch-001"],
|
|
756
|
+
"history": [
|
|
757
|
+
{
|
|
758
|
+
"task_id": "WP-017",
|
|
759
|
+
"phase": "worker",
|
|
760
|
+
"attempt_number": 1,
|
|
761
|
+
"log_path": "logs/WP-017.worker.1.log",
|
|
762
|
+
"verify_log_path": "logs/WP-017.verify.1.log",
|
|
763
|
+
"exit_code": 0,
|
|
764
|
+
"failure_class": "build_error",
|
|
765
|
+
"failure_signature": "build_error:missing-cn-import",
|
|
766
|
+
"applied_patch_ids": [],
|
|
767
|
+
"duration_sec": 42,
|
|
768
|
+
"timestamp": "2026-03-20T15:21:00Z"
|
|
769
|
+
}
|
|
770
|
+
]
|
|
771
|
+
}
|
|
772
|
+
},
|
|
773
|
+
"healing_rounds": [
|
|
774
|
+
{
|
|
775
|
+
"round_number": 1,
|
|
776
|
+
"scope": "batch",
|
|
777
|
+
"window_task_ids": ["WP-017", "WP-018"],
|
|
778
|
+
"failed_task_ids": ["WP-017", "WP-018"],
|
|
779
|
+
"decision": "RETRY",
|
|
780
|
+
"applied_patch_ids": ["patch-001"],
|
|
781
|
+
"timestamp": "2026-03-20T15:25:00Z"
|
|
782
|
+
}
|
|
783
|
+
]
|
|
784
|
+
}
|
|
785
|
+
```
|
|
786
|
+
|
|
787
|
+
### `adapter.v2`
|
|
788
|
+
|
|
789
|
+
The reference adapter contract SHOULD be expressed in TypeScript like this:
|
|
790
|
+
|
|
791
|
+
```ts
|
|
792
|
+
export type ParserErrorCode =
|
|
793
|
+
| "NO_SENTINEL"
|
|
794
|
+
| "INVALID_JSON"
|
|
795
|
+
| "SCHEMA_VIOLATION"
|
|
796
|
+
| "MISSING_REQUIRED_FIELD"
|
|
797
|
+
| "UNSUPPORTED_VERSION";
|
|
798
|
+
|
|
799
|
+
export interface PreparedInvocation {
|
|
800
|
+
cwd: string;
|
|
801
|
+
argv: string[];
|
|
802
|
+
env?: Record<string, string>;
|
|
803
|
+
stdin?: string | null;
|
|
804
|
+
timeoutSec: number;
|
|
805
|
+
}
|
|
806
|
+
|
|
807
|
+
export interface ExecutionArtifact {
|
|
808
|
+
logPath: string;
|
|
809
|
+
exitCode: number | null;
|
|
810
|
+
startedAt: string;
|
|
811
|
+
finishedAt: string;
|
|
812
|
+
}
|
|
813
|
+
|
|
814
|
+
export interface ParserFailure {
|
|
815
|
+
ok: false;
|
|
816
|
+
code: ParserErrorCode;
|
|
817
|
+
message: string;
|
|
818
|
+
}
|
|
819
|
+
|
|
820
|
+
export interface AdapterHealth {
|
|
821
|
+
ready: boolean;
|
|
822
|
+
details: string[];
|
|
823
|
+
}
|
|
824
|
+
|
|
825
|
+
export interface CliAdapter {
|
|
826
|
+
id: string;
|
|
827
|
+
capabilities: {
|
|
828
|
+
stdinPrompt: boolean;
|
|
829
|
+
argPrompt: boolean;
|
|
830
|
+
pty: boolean;
|
|
831
|
+
interactive: boolean;
|
|
832
|
+
};
|
|
833
|
+
prepare(task: ManifestTaskV2, ctx: RunContext): PreparedInvocation;
|
|
834
|
+
execute(invocation: PreparedInvocation, ctx: RunContext): Promise<ExecutionArtifact>;
|
|
835
|
+
extractResult(artifact: ExecutionArtifact, ctx: RunContext): Promise<TaskResultV2 | ParserFailure>;
|
|
836
|
+
healthcheck(ctx: RunContext): Promise<AdapterHealth>;
|
|
837
|
+
}
|
|
838
|
+
```
|
|
839
|
+
|
|
840
|
+
`extractResult` MUST validate `task_result.v2` and MUST return deterministic parser failures on invalid output.
|
|
841
|
+
`extractResult` MUST use the shared parser and validator utilities provided by the skill rather than adapter-specific parsing logic.
|
|
842
|
+
|
|
843
|
+
## 8. Verification and Safety Model
|
|
844
|
+
|
|
845
|
+
### Verification Ownership
|
|
846
|
+
|
|
847
|
+
Verification always belongs to the orchestrator. Verification MUST run after successful parse and before final task success.
|
|
848
|
+
|
|
849
|
+
The worker MAY report evidence, but the worker does not own final pass or fail classification.
|
|
850
|
+
|
|
851
|
+
If verification fails, the orchestrator MUST NOT record `DONE` regardless of the worker-declared `status`.
|
|
852
|
+
|
|
853
|
+
### Verification Layers
|
|
854
|
+
|
|
855
|
+
The orchestrator MUST support three verification layers:
|
|
856
|
+
|
|
857
|
+
| Layer | Timing | Purpose |
|
|
858
|
+
| --- | --- | --- |
|
|
859
|
+
| Post-parse validation | Immediately after parsing worker output | Validate contract integrity, candidate writes, path safety, and parser consistency. |
|
|
860
|
+
| Post-write build or test validation | After writes are applied | Detect build, test, lint, or type failures caused by the change. |
|
|
861
|
+
| Final smoke or browser validation | After build and test pass | Confirm runtime behavior, UI behavior, or custom project checks when needed. |
|
|
862
|
+
|
|
863
|
+
### Allowed Write Path
|
|
864
|
+
|
|
865
|
+
The only canonical write path is:
|
|
866
|
+
|
|
867
|
+
1. worker emits `writes[]` in `task_result.v2`
|
|
868
|
+
2. orchestrator validates those writes
|
|
869
|
+
3. orchestrator applies those writes
|
|
870
|
+
4. orchestrator verifies the result
|
|
871
|
+
|
|
872
|
+
Worker output MUST NOT be treated as direct authority to mutate protected files outside this path.
|
|
873
|
+
|
|
874
|
+
### Required Write Safeguards
|
|
875
|
+
|
|
876
|
+
The orchestrator MUST enforce these safeguards:
|
|
877
|
+
|
|
878
|
+
- path normalization so writes cannot escape the workspace root
|
|
879
|
+
- protected-file denylist
|
|
880
|
+
- shrinkage detection
|
|
881
|
+
- optional `sha256_before` precondition validation
|
|
882
|
+
- backup before write
|
|
883
|
+
- rollback on verification failure
|
|
884
|
+
|
|
885
|
+
The default shrinkage rule SHOULD reject a replacement when:
|
|
886
|
+
|
|
887
|
+
- the original file is larger than `100` bytes, and
|
|
888
|
+
- the replacement is less than `50%` of the original size, and
|
|
889
|
+
- the task or operator has not explicitly allowed the shrinkage
|
|
890
|
+
|
|
891
|
+
### Healer Patch Safety
|
|
892
|
+
|
|
893
|
+
Healer patches are subject to the same validation model for prompt and shared context files.
|
|
894
|
+
|
|
895
|
+
The healer is forbidden from:
|
|
896
|
+
|
|
897
|
+
- editing product source files directly
|
|
898
|
+
- disabling verification
|
|
899
|
+
- bypassing protected-file rules
|
|
900
|
+
- changing healing schedule mid-run
|
|
901
|
+
|
|
902
|
+
`runtime_patch` targets MAY adjust bounded runtime settings only when those settings are exposed by the operator configuration.
|
|
903
|
+
|
|
904
|
+
### Parser Authority
|
|
905
|
+
|
|
906
|
+
The parser and validator modules are the only authority allowed to interpret worker and healer contracts.
|
|
907
|
+
|
|
908
|
+
The orchestrator MUST NOT:
|
|
909
|
+
|
|
910
|
+
- parse unconstrained model prose with regex as the normative path
|
|
911
|
+
- trust exit code alone as success
|
|
912
|
+
- treat an unvalidated JSON body as a valid contract
|
|
913
|
+
|
|
914
|
+
If multiple fenced blocks exist in a log, the parser MUST use the last matching fenced block for that contract type. This rule exists to defeat prompt echo contamination and duplicate draft outputs.
|
|
915
|
+
|
|
916
|
+
## 9. Adapter Model
|
|
917
|
+
|
|
918
|
+
### Adapter Responsibilities
|
|
919
|
+
|
|
920
|
+
Adapters are the only place where CLI-specific behavior lives. An adapter MUST:
|
|
921
|
+
|
|
922
|
+
- construct the concrete CLI invocation
|
|
923
|
+
- decide whether prompt delivery uses stdin, argv, or PTY interaction
|
|
924
|
+
- manage PTY or expect requirements for interactive CLIs
|
|
925
|
+
- capture combined stdout and stderr to the execution log
|
|
926
|
+
- return execution artifacts to the orchestrator
|
|
927
|
+
- delegate contract parsing and schema validation to the shared parser and validator utilities
|
|
928
|
+
|
|
929
|
+
### Orchestrator Responsibilities
|
|
930
|
+
|
|
931
|
+
The orchestrator core MUST:
|
|
932
|
+
|
|
933
|
+
- never call CLIs directly except through `CliAdapter.execute`
|
|
934
|
+
- never parse raw logs without going through parser and validator modules
|
|
935
|
+
- never assume exit code alone means success
|
|
936
|
+
- remain CLI-agnostic outside the adapter boundary
|
|
937
|
+
|
|
938
|
+
### Initial Reference Adapters
|
|
939
|
+
|
|
940
|
+
The initial reference adapters SHOULD be:
|
|
941
|
+
|
|
942
|
+
- `agent`
|
|
943
|
+
- `opencode`
|
|
944
|
+
- `claude`
|
|
945
|
+
|
|
946
|
+
These adapters MUST share one orchestrator contract model even if their invocation mechanics differ.
|
|
947
|
+
|
|
948
|
+
### Interactive CLIs
|
|
949
|
+
|
|
950
|
+
For interactive CLIs, prompt rescue logic and TTY heuristics are adapter-local behavior. The core runtime MUST NOT embed tool-specific rescue logic.
|
|
951
|
+
|
|
952
|
+
Examples of adapter-local behavior include:
|
|
953
|
+
|
|
954
|
+
- trust prompt handling
|
|
955
|
+
- permission prompt handling
|
|
956
|
+
- idle detection
|
|
957
|
+
- PTY completion heuristics
|
|
958
|
+
|
|
959
|
+
Interactive adapters SHOULD also implement:
|
|
960
|
+
|
|
961
|
+
- ANSI stripping before parser handoff
|
|
962
|
+
- bounded idle detection
|
|
963
|
+
- finite rescue attempts for blocked prompts
|
|
964
|
+
- explicit completion detection before declaring success
|
|
965
|
+
|
|
966
|
+
### Design Proof From Recent Tempest Runners
|
|
967
|
+
|
|
968
|
+
Recent Tempest runners provide empirical design proof for several behaviors that this specification adopts:
|
|
969
|
+
|
|
970
|
+
- the newest gap-remediation runner demonstrated dependency-aware scheduling, bounded concurrency, backups, and rollback on verification failure
|
|
971
|
+
- storybook audit and fix runners demonstrated long-running batch logging and semaphore or file-lock style coordination
|
|
972
|
+
- component-check plus expect wrappers demonstrated server lifecycle ownership and adapter-local handling for interactive CLIs
|
|
973
|
+
|
|
974
|
+
These examples are evidence for the design. They are not normative inputs to the specification and are not required to understand or implement v2.
|
|
975
|
+
|
|
976
|
+
## 10. State, Resume, and Convergence Rules
|
|
977
|
+
|
|
978
|
+
### Atomic State Writes
|
|
979
|
+
|
|
980
|
+
Atomic state writes are mandatory. The orchestrator MUST write state via a temporary file followed by atomic rename on the same filesystem.
|
|
981
|
+
|
|
982
|
+
### Resume Semantics
|
|
983
|
+
|
|
984
|
+
The orchestrator MUST implement resume like this:
|
|
985
|
+
|
|
986
|
+
- `DONE` tasks are skipped on resume if `manifest_digest` still matches
|
|
987
|
+
- if `manifest_digest` changes, the orchestrator MUST warn or force reconciliation before reuse
|
|
988
|
+
- `ESCALATED` tasks are not retried automatically
|
|
989
|
+
- `FAILED` and `BLOCKED` tasks are eligible only if retry policy allows it
|
|
990
|
+
|
|
991
|
+
The orchestrator SHOULD provide a reconciliation mode that can mark affected tasks back to `PENDING` when the manifest changes in a way that invalidates prior attempts.
|
|
992
|
+
|
|
993
|
+
At minimum, reconciliation SHOULD handle:
|
|
994
|
+
|
|
995
|
+
- tasks removed from the manifest since the last run
|
|
996
|
+
- tasks added since the last run
|
|
997
|
+
- tasks whose `prompt_ref`, `depends_on`, or `verify_profile` changed
|
|
998
|
+
|
|
999
|
+
### Failure Signature Generation
|
|
1000
|
+
|
|
1001
|
+
The orchestrator MUST generate stable failure signatures. The failure signature algorithm MUST:
|
|
1002
|
+
|
|
1003
|
+
1. start with the normalized failure class
|
|
1004
|
+
2. extract the primary stable signal from parser output, verification logs, or known error codes
|
|
1005
|
+
3. remove timestamps, absolute paths, task IDs, and obviously unstable numeric fragments where possible
|
|
1006
|
+
4. lowercase and collapse whitespace
|
|
1007
|
+
5. truncate to a stable maximum length
|
|
1008
|
+
|
|
1009
|
+
The resulting format SHOULD be:
|
|
1010
|
+
|
|
1011
|
+
```text
|
|
1012
|
+
<failure_class>:<normalized_primary_signal>
|
|
1013
|
+
```
|
|
1014
|
+
|
|
1015
|
+
Examples:
|
|
1016
|
+
|
|
1017
|
+
- `contract_error:no_sentinel`
|
|
1018
|
+
- `build_error:missing_cn_import`
|
|
1019
|
+
- `timeout:worker_idle`
|
|
1020
|
+
|
|
1021
|
+
### Convergence Rules
|
|
1022
|
+
|
|
1023
|
+
Healing is converging only if at least one of these conditions is true after a healing round:
|
|
1024
|
+
|
|
1025
|
+
- total failing task count drops
|
|
1026
|
+
- repeated signature count drops
|
|
1027
|
+
- a broader failure class narrows to a more local and isolated issue
|
|
1028
|
+
|
|
1029
|
+
Healing is non-convergent if any of these conditions is true:
|
|
1030
|
+
|
|
1031
|
+
- the same task repeats the same signature after allowed retries
|
|
1032
|
+
- the same failing set persists across rounds
|
|
1033
|
+
- total healing budget is exhausted without measurable improvement
|
|
1034
|
+
|
|
1035
|
+
### Escalation Rules
|
|
1036
|
+
|
|
1037
|
+
Per-task escalation MUST happen when:
|
|
1038
|
+
|
|
1039
|
+
- a task repeats the same failure signature `signature_repeat_limit` times after healing
|
|
1040
|
+
- a failure is classified as non-healable and retry policy does not permit further attempts
|
|
1041
|
+
|
|
1042
|
+
Per-run escalation MUST happen when:
|
|
1043
|
+
|
|
1044
|
+
- `max_total_heal_rounds` is exhausted without convergence
|
|
1045
|
+
- the orchestrator determines that continuing would only repeat the same failure set
|
|
1046
|
+
|
|
1047
|
+
Escalated tasks MUST remain in state for reporting and MUST NOT be silently dropped.
|
|
1048
|
+
|
|
1049
|
+
When a run is aborted for non-convergence, the orchestrator MUST:
|
|
1050
|
+
|
|
1051
|
+
- set `run_status` to `ABORTED`
|
|
1052
|
+
- write a non-empty `abort_reason`
|
|
1053
|
+
- persist the final failing task set and last observed failure signatures
|
|
1054
|
+
- include the same reason in the human-readable run summary
|
|
1055
|
+
|
|
1056
|
+
### Learned Rule Lifecycle
|
|
1057
|
+
|
|
1058
|
+
`learned_rule` entries are durable run artifacts, not automatic policy changes.
|
|
1059
|
+
|
|
1060
|
+
The orchestrator MUST:
|
|
1061
|
+
|
|
1062
|
+
- record each accepted `learned_rule` in state or a linked healing journal
|
|
1063
|
+
- record which healing round produced the rule
|
|
1064
|
+
|
|
1065
|
+
The orchestrator MUST NOT automatically promote a learned rule into canonical shared context unless an operator or higher-level workflow explicitly chooses to do so.
|
|
1066
|
+
|
|
1067
|
+
The orchestrator MAY expose learned rules to future runs as optional advisory input, but this behavior MUST be opt-in and clearly labeled as non-canonical.
|
|
1068
|
+
|
|
1069
|
+
## 11. Rollout and Migration
|
|
1070
|
+
|
|
1071
|
+
### Replace-Now Migration Strategy
|
|
1072
|
+
|
|
1073
|
+
This specification assumes a replace-now migration. The migration steps are:
|
|
1074
|
+
|
|
1075
|
+
1. Freeze schemas and enums.
|
|
1076
|
+
2. Publish canonical `SKILL.md` and `SPEC.md`.
|
|
1077
|
+
3. Add the TSX reference orchestrator and validator modules.
|
|
1078
|
+
4. Wrap current shell and expect flows behind adapters.
|
|
1079
|
+
5. Replace platform-specific authoritative docs with thin wrappers.
|
|
1080
|
+
6. Mark legacy parsing and legacy contract docs as deprecated.
|
|
1081
|
+
7. Run a reference validation manifest before declaring cutover complete.
|
|
1082
|
+
|
|
1083
|
+
### Legacy Variant Behavior
|
|
1084
|
+
|
|
1085
|
+
After cutover:
|
|
1086
|
+
|
|
1087
|
+
- old docs remain historical references only
|
|
1088
|
+
- old docs MUST NOT define new behavior
|
|
1089
|
+
- legacy text or XML parsing MAY exist only as an explicit compatibility plugin
|
|
1090
|
+
- compatibility plugins are non-normative and MUST NOT be the default path
|
|
1091
|
+
|
|
1092
|
+
### Wrapper Rules
|
|
1093
|
+
|
|
1094
|
+
Platform-specific wrapper files MUST:
|
|
1095
|
+
|
|
1096
|
+
- point to the canonical `SKILL.md` and `SPEC.md`
|
|
1097
|
+
- stay thin
|
|
1098
|
+
- avoid duplicating architecture
|
|
1099
|
+
|
|
1100
|
+
Platform-specific wrapper files MUST NOT:
|
|
1101
|
+
|
|
1102
|
+
- redefine healing policy
|
|
1103
|
+
- redefine contract formats
|
|
1104
|
+
- introduce parser behavior not described by the canonical spec
|
|
1105
|
+
|
|
1106
|
+
### Recommended Packaging Outcome
|
|
1107
|
+
|
|
1108
|
+
The migration SHOULD leave one canonical source tree plus thin platform wrappers. Monolithic platform-specific architecture documents SHOULD be retired as authoritative artifacts.
|
|
1109
|
+
|
|
1110
|
+
## 12. Test Plan and Acceptance Criteria
|
|
1111
|
+
|
|
1112
|
+
### Architecture-Level Test Scenarios
|
|
1113
|
+
|
|
1114
|
+
The minimum test matrix MUST cover:
|
|
1115
|
+
|
|
1116
|
+
- valid worker JSON result
|
|
1117
|
+
- missing worker fence
|
|
1118
|
+
- invalid worker JSON
|
|
1119
|
+
- invalid healer JSON
|
|
1120
|
+
- schema violation on either contract
|
|
1121
|
+
- prompt echo contamination with multiple fenced blocks
|
|
1122
|
+
- automatic contract-error retry without heal-budget consumption
|
|
1123
|
+
- successful verification after write
|
|
1124
|
+
- failed verification with rollback
|
|
1125
|
+
- protected-file rejection
|
|
1126
|
+
- shrinkage rejection
|
|
1127
|
+
- concurrent window completion with atomic checkpoints
|
|
1128
|
+
- signal-triggered shutdown with resumable state
|
|
1129
|
+
- resume after interruption
|
|
1130
|
+
- manifest digest mismatch
|
|
1131
|
+
- PBH growth on stable windows
|
|
1132
|
+
- PBH retry on moderate failures
|
|
1133
|
+
- PBH shrink on instability
|
|
1134
|
+
- PBH abort with recorded non-convergence reason
|
|
1135
|
+
- repeated-signature escalation
|
|
1136
|
+
- adapter parity across `agent`, `opencode`, and `claude`
|
|
1137
|
+
|
|
1138
|
+
### Implementation Acceptance Criteria
|
|
1139
|
+
|
|
1140
|
+
The implementation is complete only if:
|
|
1141
|
+
|
|
1142
|
+
- all platform wrappers consume the same canonical schemas
|
|
1143
|
+
- the reference runtime uses the same control flow and healing policy regardless of adapter
|
|
1144
|
+
- at least one integration test exists per adapter
|
|
1145
|
+
- legacy regex-only parser paths are disabled by default
|
|
1146
|
+
- migration behavior is documented without ambiguity
|
|
1147
|
+
|
|
1148
|
+
### Document Acceptance Criteria
|
|
1149
|
+
|
|
1150
|
+
The document is acceptable only if:
|
|
1151
|
+
|
|
1152
|
+
- a peer unfamiliar with repository history can explain the system after reading it once
|
|
1153
|
+
- a second engineer can implement schemas and runtime behavior without asking what key terms mean
|
|
1154
|
+
- all defaults, stop conditions, and safety rules are explicitly named
|
|
1155
|
+
- the specification no longer depends on Tempest-specific local paths to make sense
|
|
1156
|
+
|
|
1157
|
+
## 13. Appendix: Mapping from Legacy Variants
|
|
1158
|
+
|
|
1159
|
+
| Legacy concept | v2 mapping |
|
|
1160
|
+
| --- | --- |
|
|
1161
|
+
| Per-task healing | `heal.schedule = task` |
|
|
1162
|
+
| Fixed batch healing | `heal.schedule = batch` with fixed `batch_size` |
|
|
1163
|
+
| Epoch healing | `heal.schedule = epoch` |
|
|
1164
|
+
| New default healing | `heal.schedule = auto` with `PBH` |
|
|
1165
|
+
| Shell-first regex parser stack | Fenced JSON contracts plus schema validation |
|
|
1166
|
+
| Monolithic platform docs | Thin wrappers pointing to canonical `SKILL.md` and `SPEC.md` |
|
|
1167
|
+
| Shell or expect orchestration cores | Adapter implementations beneath the reference orchestrator or conforming alternate runtimes |
|
|
1168
|
+
| Tempest runner behavior | Design proof only, not normative spec text |
|
|
1169
|
+
|
|
1170
|
+
The practical effect of this mapping is:
|
|
1171
|
+
|
|
1172
|
+
- old per-task healing still exists, but it is no longer the default
|
|
1173
|
+
- old fixed batch healing still exists, but it is now an explicit override
|
|
1174
|
+
- old epoch healing still exists, but it is now an explicit override
|
|
1175
|
+
- the new default is adaptive `PBH`
|
|
1176
|
+
- the new parser path is structured JSON plus schema validation, not text scraping
|
|
1177
|
+
|
|
1178
|
+
## Final Position
|
|
1179
|
+
|
|
1180
|
+
v2 is a spec-first, adapter-based, typed orchestration system with:
|
|
1181
|
+
|
|
1182
|
+
- one orchestrator-owned execution model
|
|
1183
|
+
- one strict JSON contract stack
|
|
1184
|
+
- one adapter boundary for multiple CLIs
|
|
1185
|
+
- one default healing policy: `PBH`
|
|
1186
|
+
- one reference runtime: TypeScript via global `tsx`
|
|
1187
|
+
- conforming alternate runtimes allowed if they preserve the same contracts, state transitions, parser guarantees, and healing behavior
|
|
1188
|
+
|
|
1189
|
+
This document is written so that a peer can understand the system without any other repo context and implement it without needing unwritten design assumptions.
|