workflow-supervisor 0.1.4 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,12 +1,28 @@
1
1
  ---
2
2
  name: workflow-supervisor
3
- description: Coordinate supervised workflows with profile-based overhead. Trigger whenever the user explicitly invokes workflow-supervisor, $workflow-supervisor, supervised workflow, lean work-unit runner, dossiers, work units, worker agents, handoffs, approval gates, durable resume, or workflow-state documentation. When explicitly invoked, first select the correct profile: lean_work_unit_runner for large already-bounded backlogs or low-footprint direct execution, strict_full_workflow for ambiguous/high-risk/source-of-truth/delegated work, or planning_only for intake and sequencing. Do not run strict ceremony just because the skill was named. When not explicitly invoked, use only for workflows with hard supervisor triggers such as multi-agent handoff, durable resume, high-risk verification, contradictory or missing sources, multi-unit scope, repair loops, approval gates, or workflow-state documentation.
3
+ description: Coordinate supervised workflows with profile-based overhead. Trigger whenever the user explicitly invokes workflow-supervisor, $workflow-supervisor, supervised workflow, lean work-unit runner, dossiers, work units, worker agents, handoffs, approval gates, durable resume, or workflow-state documentation. Route first before profile selection. If not explicitly invoked and the work is a small clear edit with obvious files and acceptance, do not invoke Workflow Supervisor. Execute directly. When explicitly invoked, first select the correct profile: lean_work_unit_runner for large already-bounded backlogs or low-footprint direct execution, strict_full_workflow for ambiguous/high-risk/source-of-truth/delegated work, or planning_only for intake and sequencing. Do not run strict ceremony just because the skill was named. When not explicitly invoked, use only for workflows with hard supervisor triggers such as multi-agent handoff, durable resume, high-risk verification, contradictory or missing sources, multi-unit scope, repair loops, approval gates, or workflow-state documentation.
4
4
  ---
5
5
 
6
6
  # Workflow Supervisor
7
7
 
8
8
  Use this skill as the coordinating spine for supervised work. The supervisor owns decomposition, execution profile selection, loop discipline, stop gates, optional worker-agent handoff quality, and outcome reporting. It may do source discovery, implementation, focused verification, and reporting itself in lean mode. In strict mode, implementation, verification, repair-ticket writing, and documentation must be treated as separate worker-agent responsibilities when an automated worker path is available. Native threads, subagents, or the portable delegate command are transports for those worker agents.
9
9
 
10
+ ## Route First
11
+
12
+ Before profile selection, decide whether this is supervisor work at all.
13
+
14
+ If Workflow Supervisor was not explicitly invoked and the task is a small, clear edit with obvious files and acceptance, do not invoke this skill. Execute directly with normal repository inspection and the relevant check.
15
+
16
+ When Workflow Supervisor is explicitly invoked, do not silently skip it. Select the proportional profile and keep the overhead as small as that profile allows.
17
+
18
+ | Situation | Route |
19
+ |---|---|
20
+ | Small, clear edit with obvious files and acceptance | Do not use Workflow Supervisor. Execute directly. |
21
+ | Large bounded backlog with clear unit done signals | `lean_work_unit_runner`. |
22
+ | Broad, ambiguous, source-of-truth, delegated, security-sensitive, dirty-state, release, resume, or externally published work | `strict_full_workflow`. |
23
+ | Sequencing, risk review, or backlog shaping only | `planning_only`. |
24
+ | Runnable uncertainty before implementation | Create a discovery or prototype unit first. |
25
+
10
26
  ## Execution Profiles
11
27
 
12
28
  When the user explicitly invokes `workflow-supervisor`, `$workflow-supervisor`, or says to use this skill, first classify the workflow profile before creating heavy artifacts, goals, worker plans, dossiers, or subagents.
@@ -39,7 +55,9 @@ Lean mode requires a backlog where each executable unit has:
39
55
  ```yaml
40
56
  id:
41
57
  source_ref:
58
+ slice_type:
42
59
  scope:
60
+ observable_behavior:
43
61
  done:
44
62
  check:
45
63
  status: pending | active | pass | fail | blocked | escalated
@@ -48,6 +66,8 @@ notes:
48
66
 
49
67
  Do not start a lean unit unless its boundary and done signal are clear. If a unit lacks `scope`, `done`, or `check`, mark it `blocked` and ask for the smallest missing decision, or split it into smaller units. Do not hide ambiguity in notes.
50
68
 
69
+ For product or integration behavior, prefer tracer-bullet units that expose one observable behavior across the smallest useful set of layers. A product implementation unit must name `observable_behavior`, `expected_outcome`, and `demo_or_verification`, or explicitly use a non-product `slice_type` such as `prefactor`, `migration`, `research`, `document`, or `risk_boundary` with a `horizontal_slice_justification`.
70
+
51
71
  Lean per-unit loop:
52
72
 
53
73
  1. Select the next ready unit from the ledger.
@@ -71,6 +91,8 @@ Escalate a lean unit to `strict_full_workflow` or pause for human review when:
71
91
 
72
92
  Lean verification is proportional. Use `focused-check` for the unit or batch. Use `independent-verifier` only when a risk trigger or user instruction justifies the extra cost. A lean PASS requires the ledger to show every completed unit's source reference, done signal, check or substitute evidence, and touched surfaces.
73
93
 
94
+ For bug fixes and risky behavior changes, the focused check must be red-capable or explicitly waived. A red-capable loop catches the exact symptom or behavior, not merely a related build, lint, or broad test. If no correct test surface exists, record an architecture or verification finding instead of a quiet skipped check.
95
+
74
96
  ### Strict Full Workflow
75
97
 
76
98
  Strict mode always requires:
@@ -83,6 +105,7 @@ Strict mode always requires:
83
105
  6. At least one bounded work unit, even for a tiny change. Use `WU-001` when there is only one unit.
84
106
  7. A dossier for each implementation work unit before implementation begins.
85
107
  8. An acceptance matrix or acceptance draft with evidence expectations before implementation begins.
108
+ - For outcome-bearing work, the matrix must include expected user/system-visible outcomes, preferred and available verification capabilities, evidence strength, invalid PASS conditions, and row-level outcome verdicts.
86
109
  9. A worker-agent plan with implementer, verifier, repair-author, and documenter agents.
87
110
  10. A worker lifecycle record using `planned -> handed_off -> acknowledged -> reported -> verified -> resource_closed -> closed`.
88
111
  11. Verification labeled as `self-check`, `focused-check`, or `independent-verifier`.
@@ -124,8 +147,12 @@ Treat roadmap phases, source "Build" lists, and exit criteria as material when t
124
147
 
125
148
  Create exactly one implementation work unit only when all current-scope material requirements can be implemented and verified inside that one unit without hiding source requirements in residual risks, skipped checks, future work, or next recommended actions. For multi-phase, dependency-heavy, or roadmap-driven work, create one work unit per independently verifiable phase, integration, data slice, or risk boundary.
126
149
 
150
+ For user-facing behavior or integration behavior, make work units tracer-bullet shaped by default. Horizontal units are valid only for prefactoring, migration safety, infrastructure, documentation, research, or a dependency that cannot yet be verified as behavior, and they must include a horizontal-slice justification.
151
+
127
152
  Before final closeout, audit the coverage ledger. The workflow may be PASS only when every material requirement is mapped to a PASS acceptance row, explicitly waived by the user, or blocked and reported as not complete.
128
153
 
154
+ For outcome evaluation, treat the implementer report as a claim, not truth. The required chain is source requirement -> acceptance row -> outcome evidence -> verifier verdict -> supervisor audit. Tests, typecheck, lint, and build are only evidence types; they are not enough for material behavior unless the row is explicitly technical or the command observes the expected outcome.
155
+
129
156
  ## SPEC Review And Q&A Gate
130
157
 
131
158
  Before final work units, create a concise reviewable spec as `.workflow/SPEC.md` when workflow docs are enabled, or as an inline SPEC review packet when state is inline. The SPEC is the human-readable contract for interpretation, not the execution plan.
@@ -208,6 +235,10 @@ When the human answers:
208
235
  - Always produce a plan after complete intake. In `human_in_loop`, make it an approval packet and stop for approval. In `autonomous_goal`, make it an execution plan and continue only when the completed intake authorizes that path.
209
236
  - Do not begin strict implementation until complete intake and the path gate are satisfied, at least one work unit exists, at least one concrete dossier exists, worker-agent contracts exist, and no stop gate applies.
210
237
  - Do not begin lean implementation until the scope contract is recorded, the backlog contains at least one ready unit, the compact ledger exists or can be kept inline, the current unit has source reference, scope, done signal, and check, and no escalation gate applies.
238
+ - Do not begin product or integration implementation from a vague horizontal phase. Prefer a tracer-bullet unit with observable behavior and demo or verification; allow horizontal units only for prefactoring, migration, infrastructure, documentation, research, or risk-boundary work with a justification.
239
+ - Do not mark a bug fix or risky behavior change PASS unless acceptance rows name a red-capable feedback loop, or the user explicitly accepts substitute evidence.
240
+ - Do not mark outcome-bearing work PASS unless every material outcome row has fully observed evidence, or the user explicitly waives/narrows the missing proof. Row-level `CONDITIONAL_PASS` means strongly inferred but not fully observable; it is not a green final workflow status.
241
+ - When a verification capability is unavailable, record the capability limitation and required external check instead of pretending the outcome was observed. Browser snapshots, visual diffs, live services, credentials, and human reviews are verifier adapters, not universal requirements.
211
242
  - Delegate workers only through an automated supported delegation transport after complete intake and the path gate authorize delegation. If no supported transport exists, use same-session phased mode only when intake allowed it; otherwise stop as `worker_agent_unavailable`.
212
243
  - Do not start implementer, verifier, repair-author, or documenter workers before complete intake and the path gate are satisfied; role-specific start conditions are additional gates after that.
213
244
  - Do not use native thread or native subagent workers unless the environment exposes a close operation for that transport. For Codex subagents, the supervisor must call `close_agent` for every `spawn_agent` id after the worker reaches a terminal report, times out, blocks, fails validation, is cancelled, or is no longer needed.
@@ -267,6 +298,8 @@ skipped_checks:
267
298
  blockers:
268
299
  residual_risks:
269
300
  next_recommended_action:
301
+ verification_environment:
302
+ outcome_evaluations:
270
303
  ```
271
304
 
272
305
  Implementers may edit only allowed surfaces from the dossier. Verifiers must not edit. Repair authors write repair tickets from failed acceptance rows and must not expand scope. Documenters update only approved workflow or documentation surfaces after source, implementation, verification, or repair evidence exists.
@@ -318,7 +351,7 @@ Negative example: "Using Workflow Supervisor, generate an API and create the pro
318
351
  4. If the profile is `lean_work_unit_runner`, run the lean loop:
319
352
  - Confirm the source contains bounded work units or create a short upfront backlog contract. If not possible, pause for a decision or switch to `planning_only` or `strict_full_workflow`.
320
353
  - Create or select one compact ledger instead of full workflow docs.
321
- - Verify each ready unit has `id`, `source_ref`, `scope`, `done`, `check`, and `status`.
354
+ - Verify each ready unit has `id`, `source_ref`, `slice_type`, `scope`, `done`, `check`, and `status`; product or integration units also need `observable_behavior`, `expected_outcome`, and `demo_or_verification`.
322
355
  - Present a concise batch plan in `human_in_loop`, or continue in `autonomous_goal` when intake permits it.
323
356
  - Execute one unit at a time with targeted inspection, smallest patch, focused check, ledger update, and checkpoint cadence.
324
357
  - Escalate only the affected unit or batch when a strict-mode trigger appears; do not convert the whole backlog to strict mode unless the source contract is invalid.
@@ -329,7 +362,7 @@ Negative example: "Using Workflow Supervisor, generate an API and create the pro
329
362
  8. Create the source-requirement coverage ledger. If any material source requirement cannot be classified, mapped to work, or explicitly deferred, stop and ask for the missing scope decision.
330
363
  9. Create the SPEC review packet or `.workflow/SPEC.md` from the source corpus and coverage ledger.
331
364
  10. Run the SPEC Q&A gate. In `human_in_loop`, stop until the human asks questions, receives answers or revisions, and explicitly approves the SPEC. In `autonomous_goal`, continue only when no blocking questions remain and approval is not required by intake.
332
- 11. Split the objective into bounded work units from the approved or non-blocked SPEC and coverage ledger. Use `$work-unit` for ambiguous or multi-phase goals. If the task is tiny and the ledger has no deferred material requirements, create exactly one work unit named `WU-001`.
365
+ 11. Split the objective into bounded work units from the approved or non-blocked SPEC and coverage ledger. Use `$work-unit` for ambiguous, multi-phase, product, or integration goals. Prefer tracer-bullet units for user-facing or integration behavior. If the task is tiny and the ledger has no deferred material requirements, create exactly one work unit named `WU-001`.
333
366
  12. Choose a loop policy before starting work: sequential or parallel, retry limits, approval gates, budgets, goal update cadence, and blocker rules. Use `$loop-policy` when the policy is not obvious.
334
367
  13. Build dossiers for the first implementation units and any planned verification, repair, or documentation workers. Use `$dossier-builder` when delegating work to another agent or when the task has boundaries.
335
368
  14. Assign worker roles with explicit allowed and forbidden behavior. Use `$worker-roles` for multi-agent, native-thread, or portable-worker work.
@@ -343,12 +376,20 @@ Negative example: "Using Workflow Supervisor, generate an API and create the pro
343
376
  18. After the path gate is satisfied, delegate named workers from the worker delegation plan through the selected automated transport. Send each worker only its role, dossier, sources, acceptance rows, stop gates, and report schema. For native threads or subagents, record the native resource id immediately and confirm a close operation exists before starting more workers.
344
377
  19. Collect one terminal report from each worker. If a worker asks a human-facing question, convert it to `BLOCKED` and have the supervisor ask the user only when the path policy permits. For native threads or subagents, close the native resource after the report or blocker is captured.
345
378
  20. Verify independently where possible. Use `$acceptance-matrix` to map every requirement to evidence. Start verifier workers only after the relevant implementer report is available.
379
+ - For outcome-bearing rows, record the verification environment and capability manifest before judging evidence.
380
+ - Compare what the work unit required, what the implementer changed, touched surfaces, forbidden surfaces, and whether the result is a no-op, placeholder, hardcoded fixture, test-only fake, or scope creep.
381
+ - Evaluate behavior, not just build health: run the feature or use case end to end when possible, call the API/CLI/UI route directly, inspect generated files/output/state, verify data/schema/contracts, check before/after behavior, and test negative or adversarial cases.
382
+ - If browser or visual proof is unavailable, use the strongest available lower-level observable contract such as jsdom render, server-rendered output, state-machine/view-model test, API probe, file snapshot, route manifest, or static semantic diff inspection. Mark any missing stronger proof as `CONDITIONAL_PASS` or BLOCKED at the row level.
383
+ - For bug fixes and risky behavior changes, require a feedback loop with `command_or_evidence`, `red_capable`, `exact_symptom_or_behavior`, `deterministic`, `expected_runtime`, and `agent_runnable`.
384
+ - Classify evidence as `behavior_was_tested`, `related_check_ran`, or `substitute_evidence_accepted`.
385
+ - Treat PASS without a behavior-catching loop as BLOCKED unless waiver evidence accepts substitute evidence.
346
386
  21. If verification FAILs, convert findings into repair tickets and route them to a repair-author or implementer repair worker. Do not expand scope during repair.
347
387
  22. Re-run verification after repairs. Continue only until PASS, BLOCKED, repair limit, or path stop.
348
388
  23. Start documenter workers only after source, implementation, verification, or repair evidence exists, unless the documenter is explicitly creating planning state.
349
389
  24. If verification BLOCKs, record the resume checkpoint, report the blocker, and stop or ask for the missing decision. When the human answers, use Resume After Human Decision.
350
390
  25. Use `$workflow-docs` to create or refresh reusable Markdown artifacts under `<workspace>/.workflow/` when the workflow must persist across context loss, agents, or sessions.
351
391
  26. Audit skipped checks, residual risks, future work, and next recommended actions against the source-requirement coverage ledger. If any material source requirement appears there without an explicit user deferral or waiver, mark the workflow FAIL/BLOCKED and create more work units or ask for a scope decision.
392
+ - Audit outcome rows separately. A row-level `CONDITIONAL_PASS` cannot be hidden inside a final PASS unless the final report names the limitation and an explicit user waiver accepts that limitation.
352
393
  27. When all material acceptance rows are PASS or waived, apply the final disposition policy:
353
394
  - `human_in_loop`: use the completed intake final disposition; if it is `ask_at_end`, ask the human to choose PR, push main, or keep local.
354
395
  - `autonomous_goal`: use the completed intake final disposition. If it is `ask_at_end`, stop and ask before taking any final disposition action.
@@ -480,6 +521,7 @@ Stop when:
480
521
  - sources contradict each other on a material requirement
481
522
  - the requested scope cannot fit into a bounded work unit
482
523
  - `lean_work_unit_runner` is selected but the backlog lacks clear unit ids, source references, boundaries, done signals, or targeted checks
524
+ - a product or integration unit is a vague horizontal phase without observable behavior, demo or verification, valid non-product slice type, or horizontal-slice justification
483
525
  - `lean_work_unit_runner` finds a strict-mode risk trigger and the user has not authorized escalation, deferral, or a narrower unit
484
526
  - the coverage ledger is missing, incomplete, or contains material requirements classified as future work without explicit user deferral
485
527
  - human-in-loop SPEC approval is missing, marked Needs Revision, marked Blocked, or has unanswered Q&A
@@ -487,6 +529,10 @@ Stop when:
487
529
  - mandatory approval packet, work unit, dossier, worker-agent contract, or acceptance matrix is missing
488
530
  - allowed and forbidden surfaces cannot be named
489
531
  - acceptance cannot be verified with evidence
532
+ - material outcome evidence is only "tests passed", typecheck, build, or implementation prose without expected-outcome observation
533
+ - a material outcome row is `CONDITIONAL_PASS` but the final workflow is being marked PASS without explicit waiver evidence
534
+ - a required browser, visual, live-service, credential, network, or human-review capability is unavailable and no waiver or blocked status is recorded
535
+ - a bug fix or risky behavior change has only related checks and no red-capable feedback loop or explicit substitute-evidence waiver
490
536
  - a verifier is asked to edit or an implementer is asked to self-approve
491
537
  - repair loops repeat without new evidence
492
538
  - the user requires approval before continuing
@@ -517,6 +563,8 @@ Report:
517
563
  - Workers delegated, blocked, unavailable, or skipped
518
564
  - Worker lifecycle status for each role, including native resource ids and close results when native threads or subagents were used
519
565
  - Verification evidence
566
+ - Verification environment and capability limitations when they affected proof strength
567
+ - Outcome evaluation rows, including any `CONDITIONAL_PASS` rows and required external checks
520
568
  - Repairs performed or recommended
521
569
  - Checks run and skipped
522
570
  - Residual risks
@@ -1,7 +1,7 @@
1
1
  interface:
2
2
  display_name: "Workflow Supervisor"
3
3
  short_description: "Run lean, strict, or planning-only supervised workflows"
4
- default_prompt: "Use $workflow-supervisor to select the execution profile first: lean_work_unit_runner for large bounded backlogs and low-footprint direct execution, strict_full_workflow for ambiguous/high-risk/delegated work, or planning_only for sequencing without implementation. Ask required intake questions and stop until the user explicitly answers all missing items. Do not infer path, mode, delegation, final disposition, or boundaries from vague keywords. In lean mode, keep work units and a compact ledger, avoid subagents unless explicitly authorized, run targeted checks, and escalate unclear or risky units. In strict mode, create a source-requirement coverage ledger and SPEC review gate before work units, preserve source requirements in acceptance rows, and do not hide unimplemented material requirements in residual risks or future work. Prefer one-shot portable delegation when it satisfies the work. If native threads or subagents are used, record each native resource id, call the native close action such as close_agent after terminal report or blocker capture, and block final outcome if any native worker lacks a close result."
4
+ default_prompt: "Route first: if Workflow Supervisor was not explicitly invoked and the task is a small clear edit with obvious files and acceptance, do not invoke it; execute directly. Use $workflow-supervisor only after this route check. If $workflow-supervisor was explicitly invoked, select the execution profile first: lean_work_unit_runner for large bounded backlogs and low-footprint direct execution, strict_full_workflow for ambiguous/high-risk/delegated work, or planning_only for sequencing without implementation. Ask required intake questions and stop until the user explicitly answers all missing items. Do not infer path, mode, delegation, final disposition, or boundaries from vague keywords. In lean mode, keep work units and a compact ledger, avoid subagents unless explicitly authorized, run targeted checks, and escalate unclear or risky units. In strict mode, create a source-requirement coverage ledger and SPEC review gate before work units, preserve source requirements in acceptance rows, and do not hide unimplemented material requirements in residual risks or future work. For outcome-bearing work, treat implementer output as a claim: require expected outcomes, row-mapped outcome evidence, capability limitations, and evidence strength; row-level CONDITIONAL_PASS is not final green status. Prefer one-shot portable delegation when it satisfies the work. If native threads or subagents are used, record each native resource id, call the native close action such as close_agent after terminal report or blocker capture, and block final outcome if any native worker lacks a close result."
5
5
 
6
6
  policy:
7
7
  allow_implicit_invocation: false