workflow-supervisor 0.1.3 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +139 -0
- package/README.md +125 -28
- package/bin/workflow-skills.mjs +201 -1
- package/docs/artifacts.md +9 -0
- package/docs/cli.md +3 -1
- package/docs/portable-delegation.md +19 -1
- package/docs/skill-reference.md +12 -2
- package/docs/troubleshooting.md +34 -0
- package/package.json +8 -2
- package/schemas/dossier-v1.schema.json +38 -0
- package/schemas/worker-report-v1.schema.json +120 -12
- package/skills/acceptance-matrix/SKILL.md +114 -2
- package/skills/acceptance-matrix/agents/openai.yaml +1 -1
- package/skills/dossier-builder/SKILL.md +28 -0
- package/skills/loop-policy/SKILL.md +29 -6
- package/skills/work-unit/SKILL.md +46 -6
- package/skills/workflow-docs/SKILL.md +2 -1
- package/skills/workflow-docs/references/workflow-control.md +93 -6
- package/skills/workflow-supervisor/SKILL.md +195 -46
- package/skills/workflow-supervisor/agents/openai.yaml +2 -2
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
This changelog was reconstructed from npm publish metadata and git history after the first four package versions were published without GitHub releases or tags.
|
|
4
|
+
|
|
5
|
+
## 0.2.0 - 2026-06-23
|
|
6
|
+
|
|
7
|
+
Prepared outcome-evaluation verification for npm publication.
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- Added capability-aware outcome evaluation to `WorkerReportV1` through `verification_environment` and `outcome_evaluations`.
|
|
12
|
+
- Added row-level outcome verdicts so verifiers can record `CONDITIONAL_PASS` for behavior that is strongly inferred but not fully observable.
|
|
13
|
+
- Added verification capability metadata for checks such as browser snapshots, jsdom renders, API probes, state-machine tests, file snapshots, and static diff inspection.
|
|
14
|
+
- Added acceptance-matrix, dossier-builder, workflow-docs, and README guidance for expected outcomes, evidence strength, invalid PASS conditions, and capability limitations.
|
|
15
|
+
|
|
16
|
+
### Changed
|
|
17
|
+
|
|
18
|
+
- Treat implementer `PASS` as a claim that must be mapped to source requirements, acceptance rows, outcome evidence, verifier verdicts, and supervisor audit.
|
|
19
|
+
- Treat tests, typecheck, lint, and build as evidence types instead of automatic material-outcome proof.
|
|
20
|
+
- Require material outcome rows to be directly observed as `PASS`, blocked, or explicitly waived before final green status.
|
|
21
|
+
|
|
22
|
+
### Fixed
|
|
23
|
+
|
|
24
|
+
- Reject top-level `CONDITIONAL_PASS` as an invalid `WorkerReportV1.status`.
|
|
25
|
+
- Reject top-level `PASS` reports when any outcome row is failed, blocked, or only conditionally observed.
|
|
26
|
+
- Reject `PASS` outcome rows without row-mapped evidence and reject unknown verification capabilities.
|
|
27
|
+
- Prevent unavailable browser, visual, live-service, credential, network, or human-review proof from being hidden inside final PASS reports.
|
|
28
|
+
|
|
29
|
+
### Verified
|
|
30
|
+
|
|
31
|
+
- Expanded delegate CLI tests for conditional outcome rows, missing row evidence, and unknown capabilities.
|
|
32
|
+
- Expanded lifecycle tests to assert outcome verification rules across supervisor, acceptance matrix, dossier, workflow docs, README, troubleshooting, and schema artifacts.
|
|
33
|
+
- Validated the package with `npm run validate` before release prep.
|
|
34
|
+
|
|
35
|
+
## 0.1.4 - 2026-06-19
|
|
36
|
+
|
|
37
|
+
Prepared for npm publication.
|
|
38
|
+
|
|
39
|
+
### Added
|
|
40
|
+
|
|
41
|
+
- Added profile-based supervisor execution with `lean_work_unit_runner`, `strict_full_workflow`, and `planning_only`.
|
|
42
|
+
- Added compact lean-runner ledger guidance for large bounded backlogs that need lower memory and less ceremony.
|
|
43
|
+
- Added native worker resource lifecycle rules for thread and subagent transports.
|
|
44
|
+
|
|
45
|
+
### Changed
|
|
46
|
+
|
|
47
|
+
- Changed strict worker lifecycle from logical closeout only to `planned -> handed_off -> acknowledged -> reported -> verified -> resource_closed -> closed`.
|
|
48
|
+
- Required native worker transports to record resource ids, close actions, and close results before final workflow outcome.
|
|
49
|
+
- Made one-shot portable delegation the preferred worker path when it satisfies the work, because it avoids resident native workers.
|
|
50
|
+
|
|
51
|
+
### Fixed
|
|
52
|
+
|
|
53
|
+
- Prevented completed Codex subagents from remaining open after workflow-supervisor runs by requiring `close_agent` for every recorded native `agent_id`.
|
|
54
|
+
- Blocked final PASS when any native worker has no recorded close result.
|
|
55
|
+
- Reduced large-backlog memory pressure by defaulting lean execution to same-session phased work unless workers are explicitly authorized or risk escalation requires them.
|
|
56
|
+
|
|
57
|
+
### Verified
|
|
58
|
+
|
|
59
|
+
- Expanded lifecycle tests to cover profile selection, lean ledgers, native worker resource ids, `close_agent`, and close-result gates.
|
|
60
|
+
|
|
61
|
+
## 0.1.3 - 2026-06-17
|
|
62
|
+
|
|
63
|
+
Published to npm: 2026-06-17 22:09:08 UTC
|
|
64
|
+
|
|
65
|
+
Commit: `154bbd7`
|
|
66
|
+
|
|
67
|
+
### Added
|
|
68
|
+
|
|
69
|
+
- Added resumable SPEC gate behavior so broad source-controlled workflows can pause for human review before final work units, dossiers, and implementation.
|
|
70
|
+
- Added resume guidance for autonomous workflows that block on a human decision, including updates to workflow state, goal state, and decision artifacts.
|
|
71
|
+
- Expanded troubleshooting guidance for broad roadmap scope, residual risks that hide required work, and SPEC review before work units.
|
|
72
|
+
|
|
73
|
+
### Changed
|
|
74
|
+
|
|
75
|
+
- Hardened workflow-supervisor scope coverage so material source requirements, roadmap phases, exit criteria, named systems, and numeric targets must be mapped to work units, explicitly deferred, blocked, or marked non-material.
|
|
76
|
+
- Updated acceptance, loop-policy, work-unit, and workflow-docs instructions to preserve source requirement strength and avoid quiet downgrades.
|
|
77
|
+
|
|
78
|
+
### Verified
|
|
79
|
+
|
|
80
|
+
- Expanded workflow-supervisor lifecycle tests for source coverage, SPEC review, and resume behavior.
|
|
81
|
+
|
|
82
|
+
## 0.1.2 - 2026-06-17
|
|
83
|
+
|
|
84
|
+
Published to npm: 2026-06-17 16:00:10 UTC
|
|
85
|
+
|
|
86
|
+
Commit: `b449656`
|
|
87
|
+
|
|
88
|
+
### Changed
|
|
89
|
+
|
|
90
|
+
- Reworked the workflow-supervisor skill around a stricter worker-agent supervisor architecture.
|
|
91
|
+
- Made explicit supervisor invocation require full intake, work units, dossiers, worker-agent contracts, scoped handoffs, report schema, and verification even for small tasks.
|
|
92
|
+
- Clarified that implementation, verification, repair-authoring, and documentation are separate worker-agent responsibilities when an automated worker path is available.
|
|
93
|
+
- Rewrote the README around the strict worker supervisor model and the current package workflow.
|
|
94
|
+
|
|
95
|
+
### Verified
|
|
96
|
+
|
|
97
|
+
- Added lifecycle coverage for strict supervisor invocation behavior.
|
|
98
|
+
|
|
99
|
+
## 0.1.1 - 2026-06-15
|
|
100
|
+
|
|
101
|
+
Published to npm: 2026-06-15 10:59:19 UTC
|
|
102
|
+
|
|
103
|
+
Commit: `ee4c02b`
|
|
104
|
+
|
|
105
|
+
### Added
|
|
106
|
+
|
|
107
|
+
- Added portable worker delegation for Codex and Claude Code through `workflow-supervisor delegate`.
|
|
108
|
+
- Added `WorkerReportV1` and `DossierV1` schema artifacts plus dossier validation before delegation.
|
|
109
|
+
- Added `delegate-doctor` for adapter inspection and optional probe runs.
|
|
110
|
+
- Added project-scope `.workflow/` ignore handling for local workflow state.
|
|
111
|
+
- Added portable delegation documentation and tests for install, delegation, and lifecycle behavior.
|
|
112
|
+
|
|
113
|
+
### Changed
|
|
114
|
+
|
|
115
|
+
- Renamed the primary package executable path around `workflow-supervisor` while keeping `workflow-skills` as an executable alias.
|
|
116
|
+
- Narrowed certified install/delegation targets to Codex, Claude Code, and generic Markdown contexts.
|
|
117
|
+
- Strengthened validation to include adapter metadata and schema artifacts.
|
|
118
|
+
|
|
119
|
+
### Verified
|
|
120
|
+
|
|
121
|
+
- Added Node test coverage for delegate CLI behavior, installation behavior, portable delegation, and supervisor lifecycle handling.
|
|
122
|
+
|
|
123
|
+
## 0.1.0 - 2026-06-14
|
|
124
|
+
|
|
125
|
+
Published to npm: 2026-06-14 23:35:57 UTC
|
|
126
|
+
|
|
127
|
+
Source: npm tarball contents. The GitHub release tag for this version is a reconstructed source snapshot from the npm tarball because no exact matching commit exists in the branch history for this first publish.
|
|
128
|
+
|
|
129
|
+
### Added
|
|
130
|
+
|
|
131
|
+
- Initial npm package for the workflow-supervisor skill pack.
|
|
132
|
+
- Added the bundled skills: `workflow-supervisor`, `worker-roles`, `acceptance-matrix`, `dossier-builder`, `source-corpus`, `loop-policy`, `work-unit`, and `workflow-docs`.
|
|
133
|
+
- Added the `workflow-supervisor` and `workflow-skills` executables for listing, validating, installing, uninstalling, and emitting portable context.
|
|
134
|
+
- Added Codex, Claude Code, OpenCode, HermesAgent, and generic adapter metadata, plus package documentation, troubleshooting notes, compatibility notes, and a README overview.
|
|
135
|
+
- Added packaging metadata, test coverage, and prepublish validation through `npm run validate`.
|
|
136
|
+
|
|
137
|
+
### Verified
|
|
138
|
+
|
|
139
|
+
- Initial package validation covered skill folder structure, `SKILL.md` metadata, and publishable package layout.
|
package/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
# Workflow Supervisor
|
|
2
2
|
|
|
3
|
-
Workflow Supervisor is a
|
|
3
|
+
Workflow Supervisor is a profile-based supervision skill pack for agent work that needs to stay organized, resumable, evidence-backed, and proportional to the work.
|
|
4
4
|
|
|
5
|
-
It is for moments when you do not want an agent to
|
|
5
|
+
It is for moments when you do not want an agent to lose the thread halfway through, quietly skip scope, or turn a large backlog into an unreviewable blur. You ask for the supervisor, the supervisor selects the right execution profile, keeps the work units explicit, verifies results with evidence, and leaves a clear outcome trail. Heavy multi-agent ceremony is available when risk justifies it; large pure backlogs can use a lean runner that keeps the agent focused on delivery.
|
|
6
6
|
|
|
7
7
|
Example prompt:
|
|
8
8
|
|
|
@@ -10,19 +10,37 @@ Example prompt:
|
|
|
10
10
|
Use $workflow-supervisor to build a FastAPI Naive RAG demo for a healthcare specialist agent.
|
|
11
11
|
```
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
When you explicitly ask for Workflow Supervisor, the correct first response is not code. The correct first response is an intake packet. That is intentional.
|
|
14
14
|
|
|
15
15
|

|
|
16
16
|
|
|
17
|
+
## Route First
|
|
18
|
+
|
|
19
|
+
Use the smallest route that fits the work before choosing a profile.
|
|
20
|
+
|
|
21
|
+
| Situation | Route |
|
|
22
|
+
|---|---|
|
|
23
|
+
| Small, clear edit with obvious files and acceptance | Do not use Workflow Supervisor. Execute directly. |
|
|
24
|
+
| Large bounded backlog with clear unit done signals | `lean_work_unit_runner`. |
|
|
25
|
+
| Broad, ambiguous, source-of-truth, delegated, security-sensitive, dirty-state, release, resume, or externally published work | `strict_full_workflow`. |
|
|
26
|
+
| Sequencing, risk review, or backlog shaping only | `planning_only`. |
|
|
27
|
+
| Runnable uncertainty before implementation | Create a discovery or prototype unit first. |
|
|
28
|
+
|
|
29
|
+
This route check matters most when Workflow Supervisor was not explicitly invoked. If the task is a tiny direct edit, normal repo work is better than starting a supervisor loop.
|
|
30
|
+
|
|
31
|
+
If the user explicitly invokes `workflow-supervisor`, `$workflow-supervisor`, or says to use the skill, do not silently skip it. Select the proportional profile, then keep the overhead as small as that profile allows.
|
|
32
|
+
|
|
17
33
|
## What You Get
|
|
18
34
|
|
|
19
35
|
Workflow Supervisor gives you a repeatable workflow for serious agent tasks:
|
|
20
36
|
|
|
21
37
|
- a complete intake before work starts
|
|
38
|
+
- a profile choice between `lean_work_unit_runner`, `strict_full_workflow`, and `planning_only`
|
|
22
39
|
- a source map, even when the only source is the user prompt
|
|
23
40
|
- a source-requirement coverage ledger so roadmap items and exit criteria cannot disappear
|
|
24
41
|
- a `SPEC.md` review gate where humans can ask questions, request revisions, block, defer, or approve before work units are finalized
|
|
25
42
|
- bounded work units, including `WU-001` for tiny tasks
|
|
43
|
+
- a compact ledger for high-throughput work-unit execution
|
|
26
44
|
- dossiers that tell each worker exactly what to do and what not to touch
|
|
27
45
|
- separate implementer, verifier, repair, and documenter responsibilities
|
|
28
46
|
- structured worker reports instead of loose prose
|
|
@@ -31,7 +49,7 @@ Workflow Supervisor gives you a repeatable workflow for serious agent tasks:
|
|
|
31
49
|
- durable `.workflow/` state when the work needs to survive context loss
|
|
32
50
|
- a final report with checks, risks, workers, and next actions
|
|
33
51
|
|
|
34
|
-
The main design choice is simple:
|
|
52
|
+
The main design choice is simple: supervision is mandatory when requested, but overhead is profile-dependent. Work units preserve clarity. Workers, dossiers, and independent verifier loops are tools for strict or escalated work, not a default tax on every unit.
|
|
35
53
|
|
|
36
54
|
## The Mental Model
|
|
37
55
|
|
|
@@ -88,9 +106,35 @@ flowchart TB
|
|
|
88
106
|
|
|
89
107
|
## What Happens When You Invoke It
|
|
90
108
|
|
|
91
|
-
When you explicitly invoke `workflow-supervisor`, `$workflow-supervisor`, or say to use the skill, the
|
|
109
|
+
When you explicitly invoke `workflow-supervisor`, `$workflow-supervisor`, or say to use the skill, the first supervisor decision is the execution profile:
|
|
92
110
|
|
|
93
|
-
|
|
111
|
+
- `lean_work_unit_runner`: for large, already-bounded work-unit backlogs where throughput and low memory matter.
|
|
112
|
+
- `strict_full_workflow`: for ambiguous, high-risk, delegated, security-sensitive, source-of-truth, publication, or cross-system work.
|
|
113
|
+
- `planning_only`: for intake, sequencing, risk review, and recommendations without implementation.
|
|
114
|
+
|
|
115
|
+
Lean mode keeps work units but removes per-unit ceremony. It uses one upfront scope contract, one compact ledger, targeted checks, and strict escalation gates:
|
|
116
|
+
|
|
117
|
+
```text
|
|
118
|
+
select next ready unit
|
|
119
|
+
-> inspect only needed sources
|
|
120
|
+
-> patch or update the allowed surface
|
|
121
|
+
-> run the targeted check
|
|
122
|
+
-> update one ledger row
|
|
123
|
+
-> continue until batch checkpoint, blocker, or final disposition
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
A lean unit is not ready unless it has:
|
|
127
|
+
|
|
128
|
+
```yaml
|
|
129
|
+
id:
|
|
130
|
+
source_ref:
|
|
131
|
+
scope:
|
|
132
|
+
done:
|
|
133
|
+
check:
|
|
134
|
+
status:
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Strict mode is still available when risk justifies it. In strict mode, task size does not matter. The full workflow is:
|
|
94
138
|
|
|
95
139
|
1. Ask the complete intake packet.
|
|
96
140
|
2. Build or record the source corpus.
|
|
@@ -108,20 +152,21 @@ Strict mode means task size does not matter. Even if the request is "make a func
|
|
|
108
152
|
14. Refresh docs or outcome state.
|
|
109
153
|
15. Report final status and next action.
|
|
110
154
|
|
|
111
|
-
|
|
155
|
+
Profile selection exists to prevent both failure modes: skipping supervision when work is risky, and drowning simple or already-bounded work in process.
|
|
112
156
|
|
|
113
157
|
## Intake
|
|
114
158
|
|
|
115
|
-
The supervisor must get explicit answers to these
|
|
159
|
+
The supervisor must get explicit answers to these eight items before planning deeply, creating a goal, delegating workers, implementing, publishing, or taking irreversible action:
|
|
116
160
|
|
|
117
161
|
```text
|
|
118
162
|
1. Objective and source: what artifact, spec, repo path, document, ticket, or source set controls the work?
|
|
119
|
-
2.
|
|
120
|
-
3.
|
|
121
|
-
4.
|
|
122
|
-
5.
|
|
123
|
-
6.
|
|
124
|
-
7.
|
|
163
|
+
2. Profile: lean_work_unit_runner, strict_full_workflow, or planning_only?
|
|
164
|
+
3. Execution path: autonomous_goal or human_in_loop?
|
|
165
|
+
4. Mode: sequential, parallel where safe, or staged parallel?
|
|
166
|
+
5. Delegation: same-session phased, automated worker delegation, or native threads/subagents if available?
|
|
167
|
+
6. Final disposition: keep local, open PR, push main, deploy/publish, or ask at the end?
|
|
168
|
+
7. Boundaries: may I install dependencies, call external services, use credentials, or only edit local files?
|
|
169
|
+
8. State artifacts: compact ledger, .workflow docs, another artifact directory, or inline state?
|
|
125
170
|
```
|
|
126
171
|
|
|
127
172
|
If any answer is missing or vague, the supervisor asks only for the missing pieces and stops. Phrases like "work autonomously", "just do it", or "use your judgment" do not fill in the missing intake fields.
|
|
@@ -156,10 +201,10 @@ complete intake
|
|
|
156
201
|
The worker lifecycle is tracked separately:
|
|
157
202
|
|
|
158
203
|
```text
|
|
159
|
-
planned -> handed_off -> acknowledged -> reported -> verified -> closed
|
|
204
|
+
planned -> handed_off -> acknowledged -> reported -> verified -> resource_closed -> closed
|
|
160
205
|
```
|
|
161
206
|
|
|
162
|
-
This makes it possible to see where the workflow is, which worker owns which piece, what evidence exists, and
|
|
207
|
+
This makes it possible to see where the workflow is, which worker owns which piece, what evidence exists, what native resource was opened, and whether that resource was closed. A native worker is not closed just because it returned a report.
|
|
163
208
|
|
|
164
209
|
For source-of-truth builds, the coverage ledger is the guardrail against "green but incomplete" outcomes. Every material source requirement must be mapped to a work unit and acceptance row, explicitly deferred by the user, blocked as a scope decision, or marked non-material with a reason. Residual risks and future-work notes cannot contain unimplemented material source requirements in a PASS workflow.
|
|
165
210
|
|
|
@@ -199,9 +244,9 @@ Common workflow files:
|
|
|
199
244
|
| `.workflow/SPEC.md` | `workflow-supervisor`, `source-corpus`, `workflow-docs` | Human-reviewable interpretation, requirement coverage, Q&A, and approval decision before work units. |
|
|
200
245
|
| `.workflow/WORK-UNITS.md` | `work-unit`, `workflow-docs` | Unit list, dependencies, sequencing, blocked units. |
|
|
201
246
|
| `.workflow/DOSSIER.md` or `.workflow/dossiers/*.yaml` | `dossier-builder`, `workflow-docs` | Worker contracts for implementation, verification, repair, or documentation. |
|
|
202
|
-
| `.workflow/WORKER-MAP.md` | `workflow-supervisor`, `worker-roles`, `workflow-docs` | Worker names, roles, transports, lifecycle, reports, blockers. |
|
|
203
|
-
| `.workflow/ACCEPTANCE-MATRIX.md` | `acceptance-matrix`, `workflow-docs` | Evidence rows and material PASS, FAIL, BLOCKED states. |
|
|
204
|
-
| `.workflow/VERIFICATION-REPORT.md` | verifier worker, `acceptance-matrix`, `workflow-docs` | Verification evidence, findings, skipped checks, residual risks. |
|
|
247
|
+
| `.workflow/WORKER-MAP.md` | `workflow-supervisor`, `worker-roles`, `workflow-docs` | Worker names, roles, transports, native resource ids, lifecycle, reports, close results, blockers. |
|
|
248
|
+
| `.workflow/ACCEPTANCE-MATRIX.md` | `acceptance-matrix`, `workflow-docs` | Evidence rows, outcome evaluation rows, capability limits, and material PASS, FAIL, BLOCKED states. |
|
|
249
|
+
| `.workflow/VERIFICATION-REPORT.md` | verifier worker, `acceptance-matrix`, `workflow-docs` | Verification environment, outcome evidence, findings, skipped checks, residual risks. |
|
|
205
250
|
| `.workflow/REPAIR-TICKETS.md` | repair worker, `workflow-docs` | Repair tasks tied to failed rows or verifier findings. |
|
|
206
251
|
| `.workflow/DECISIONS.md` | supervisor, `workflow-docs` | User decisions, assumptions, reversals, unresolved questions. |
|
|
207
252
|
| `.workflow/HANDOFF.md` | supervisor, `workflow-docs` | Resume pack for another agent or later session. |
|
|
@@ -305,6 +350,33 @@ Every delegated worker returns this machine-shaped report:
|
|
|
305
350
|
"findings": [],
|
|
306
351
|
"blocking_question": null,
|
|
307
352
|
"next_action": "supervisor_review",
|
|
353
|
+
"verification_environment": {
|
|
354
|
+
"shell": true,
|
|
355
|
+
"filesystem": true,
|
|
356
|
+
"git_diff": true,
|
|
357
|
+
"browser": false,
|
|
358
|
+
"playwright_mcp": false,
|
|
359
|
+
"network": false,
|
|
360
|
+
"capabilities": ["shell_command", "api_probe", "static_diff_inspection"],
|
|
361
|
+
"limitations": ["Browser layout was not verified because browser capability was unavailable"]
|
|
362
|
+
},
|
|
363
|
+
"outcome_evaluations": [
|
|
364
|
+
{
|
|
365
|
+
"id": "A-001",
|
|
366
|
+
"source_requirement": "User can create a workflow and read it back.",
|
|
367
|
+
"expected_outcome": "The API persists the workflow and returns it with the expected schema.",
|
|
368
|
+
"preferred_verification": ["integration_test", "api_probe", "static_diff_inspection"],
|
|
369
|
+
"available_verification": ["integration_test", "api_probe", "static_diff_inspection"],
|
|
370
|
+
"evidence_strength": {
|
|
371
|
+
"strongest_possible": ["integration_test"],
|
|
372
|
+
"strongest_available": ["integration_test"]
|
|
373
|
+
},
|
|
374
|
+
"evidence": ["pytest tests/test_api.py passed"],
|
|
375
|
+
"invalid_pass_conditions": ["tests only without API behavior assertion", "hardcoded fixture"],
|
|
376
|
+
"verdict": "PASS",
|
|
377
|
+
"finding": ""
|
|
378
|
+
}
|
|
379
|
+
],
|
|
308
380
|
"adapter": null,
|
|
309
381
|
"guard": null,
|
|
310
382
|
"reason": null
|
|
@@ -313,9 +385,30 @@ Every delegated worker returns this machine-shaped report:
|
|
|
313
385
|
|
|
314
386
|
The supervisor trusts the report shape, not loose prose. A PASS without evidence is invalid. A verifier that edits implementation is invalid. A worker that asks the human directly is converted into a blocker for the supervisor to route.
|
|
315
387
|
|
|
388
|
+
Outcome verification treats the implementer report as a claim. A material behavior row needs expected outcome, evidence strength, preferred and available verification capabilities, invalid PASS conditions, and row-mapped evidence. `CONDITIONAL_PASS` is allowed only as a row-level outcome verdict for behavior that is strongly inferred but not fully observable; it is not a top-level `WorkerReportV1.status` and must not be treated as final green status without explicit waiver evidence.
|
|
389
|
+
|
|
390
|
+
### Outcome Evaluation
|
|
391
|
+
|
|
392
|
+
The implementer report is not the proof of the outcome. It is a structured claim that gives the supervisor something concrete to verify.
|
|
393
|
+
|
|
394
|
+
For outcome-bearing work, the verifier maps each material source requirement to:
|
|
395
|
+
|
|
396
|
+
- the expected user or system-visible outcome
|
|
397
|
+
- the strongest preferred verification capability
|
|
398
|
+
- the strongest verification capability actually available
|
|
399
|
+
- row-mapped evidence
|
|
400
|
+
- invalid PASS conditions
|
|
401
|
+
- a row verdict
|
|
402
|
+
|
|
403
|
+
Valid row verdicts are `PASS`, `FAIL`, `BLOCKED`, and `CONDITIONAL_PASS`. A row can be `CONDITIONAL_PASS` only when the available evidence strongly supports the outcome but a stronger material capability is unavailable. The missing capability and required external check must be recorded.
|
|
404
|
+
|
|
405
|
+
The final worker status remains narrower: `PASS`, `FAIL`, or `BLOCKED`. A final `PASS` requires every material outcome row to be `PASS` unless the user explicitly waives or narrows the missing proof.
|
|
406
|
+
|
|
407
|
+
For native threads or subagents, the report is only the work result. The supervisor must also close the native resource. For Codex subagents, record the returned `agent_id` and call `close_agent` after the report, timeout, failure, blocker, cancellation, or invalid-output result is captured. Final outcome is blocked while any native worker lacks a close result.
|
|
408
|
+
|
|
316
409
|
## How The Supervisor Talks To Workers
|
|
317
410
|
|
|
318
|
-
The portable worker path is one CLI command:
|
|
411
|
+
The portable worker path is one CLI command and is preferred when it satisfies the work because it is one-shot:
|
|
319
412
|
|
|
320
413
|
```bash
|
|
321
414
|
workflow-supervisor delegate \
|
|
@@ -351,9 +444,11 @@ workflow-supervisor delegate-doctor --agent all --probe --require-pass
|
|
|
351
444
|
|
|
352
445
|
If a worker adapter is missing, unauthenticated, times out, returns invalid output, edits forbidden surfaces, or returns PASS without evidence, the delegate command returns a structured `BLOCKED` report.
|
|
353
446
|
|
|
447
|
+
Native thread or subagent transports may be used only when the environment exposes the full lifecycle: create, wait or receive a terminal report, and close. If a native transport can start workers but cannot close them, the supervisor records `worker_resource_close_unavailable` and uses portable delegation or same-session phased work only when intake allows it.
|
|
448
|
+
|
|
354
449
|
## No Silent Fallbacks
|
|
355
450
|
|
|
356
|
-
|
|
451
|
+
In `strict_full_workflow` with worker delegation selected, if the environment can create, message, or delegate to worker agents, the supervisor must use real workers for implementation, verification, repair, and documentation responsibilities.
|
|
357
452
|
|
|
358
453
|
If it cannot, it must record:
|
|
359
454
|
|
|
@@ -363,7 +458,7 @@ worker_agent_unavailable
|
|
|
363
458
|
|
|
364
459
|
Then it must stop for a human decision unless complete intake explicitly selected `same_session_phased`.
|
|
365
460
|
|
|
366
|
-
|
|
461
|
+
In `lean_work_unit_runner`, same-session phased execution is the default unless the user explicitly authorizes workers for a batch or escalation. Verification in same-session mode is a `focused-check` or `self-check`, not an `independent-verifier`.
|
|
367
462
|
|
|
368
463
|
## Install
|
|
369
464
|
|
|
@@ -538,13 +633,15 @@ If you are an agent using this package:
|
|
|
538
633
|
|
|
539
634
|
1. Do not start work before complete intake.
|
|
540
635
|
2. Do not infer missing permissions from words like "autonomous", "generate", or "work until done".
|
|
541
|
-
3. If `$workflow-supervisor` is explicit,
|
|
542
|
-
4.
|
|
543
|
-
5.
|
|
544
|
-
6.
|
|
545
|
-
7. Treat same-session verification as `self-check`, not `independent-verifier`.
|
|
636
|
+
3. If `$workflow-supervisor` is explicit, select `lean_work_unit_runner`, `strict_full_workflow`, or `planning_only` before heavy planning.
|
|
637
|
+
4. Always keep work units explicit; lean mode uses a compact ledger instead of full per-unit ceremony.
|
|
638
|
+
5. Do not delegate without a valid `DossierV1`.
|
|
639
|
+
6. Use separate worker agents in strict or explicitly delegated work, not by default for lean execution.
|
|
640
|
+
7. Treat same-session verification as `focused-check` or `self-check`, not `independent-verifier`.
|
|
546
641
|
8. Trust only structured `WorkerReportV1` results from delegated workers.
|
|
547
642
|
9. Treat verifier edits as invalid.
|
|
548
|
-
10.
|
|
643
|
+
10. Treat tests/typecheck/build as evidence types, not automatic outcome proof.
|
|
644
|
+
11. Record capability limits instead of pretending browser, visual, live-service, credential, network, or human-review proof exists.
|
|
645
|
+
12. Keep `.workflow/` ignored and local unless the user explicitly asks to publish it.
|
|
549
646
|
|
|
550
647
|
The promise is not magic autonomy. The promise is disciplined supervision: clear setup, bounded work, scoped workers, structured reports, evidence, repair, and a clean final handoff.
|