npm - workflow-supervisor - Versions diffs - 0.1.3 → 0.2.0 - Mend

workflow-supervisor 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/CHANGELOG.md +139 -0
package/README.md +125 -28
package/bin/workflow-skills.mjs +201 -1
package/docs/artifacts.md +9 -0
package/docs/cli.md +3 -1
package/docs/portable-delegation.md +19 -1
package/docs/skill-reference.md +12 -2
package/docs/troubleshooting.md +34 -0
package/package.json +8 -2
package/schemas/dossier-v1.schema.json +38 -0
package/schemas/worker-report-v1.schema.json +120 -12
package/skills/acceptance-matrix/SKILL.md +114 -2
package/skills/acceptance-matrix/agents/openai.yaml +1 -1
package/skills/dossier-builder/SKILL.md +28 -0
package/skills/loop-policy/SKILL.md +29 -6
package/skills/work-unit/SKILL.md +46 -6
package/skills/workflow-docs/SKILL.md +2 -1
package/skills/workflow-docs/references/workflow-control.md +93 -6
package/skills/workflow-supervisor/SKILL.md +195 -46
package/skills/workflow-supervisor/agents/openai.yaml +2 -2

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,139 @@
+# Changelog
+This changelog was reconstructed from npm publish metadata and git history after the first four package versions were published without GitHub releases or tags.
+## 0.2.0 - 2026-06-23
+Prepared outcome-evaluation verification for npm publication.
+### Added
+- Added capability-aware outcome evaluation to `WorkerReportV1` through `verification_environment` and `outcome_evaluations`.
+- Added row-level outcome verdicts so verifiers can record `CONDITIONAL_PASS` for behavior that is strongly inferred but not fully observable.
+- Added verification capability metadata for checks such as browser snapshots, jsdom renders, API probes, state-machine tests, file snapshots, and static diff inspection.
+- Added acceptance-matrix, dossier-builder, workflow-docs, and README guidance for expected outcomes, evidence strength, invalid PASS conditions, and capability limitations.
+### Changed
+- Treat implementer `PASS` as a claim that must be mapped to source requirements, acceptance rows, outcome evidence, verifier verdicts, and supervisor audit.
+- Treat tests, typecheck, lint, and build as evidence types instead of automatic material-outcome proof.
+- Require material outcome rows to be directly observed as `PASS`, blocked, or explicitly waived before final green status.
+### Fixed
+- Reject top-level `CONDITIONAL_PASS` as an invalid `WorkerReportV1.status`.
+- Reject top-level `PASS` reports when any outcome row is failed, blocked, or only conditionally observed.
+- Reject `PASS` outcome rows without row-mapped evidence and reject unknown verification capabilities.
+- Prevent unavailable browser, visual, live-service, credential, network, or human-review proof from being hidden inside final PASS reports.
+### Verified
+- Expanded delegate CLI tests for conditional outcome rows, missing row evidence, and unknown capabilities.
+- Expanded lifecycle tests to assert outcome verification rules across supervisor, acceptance matrix, dossier, workflow docs, README, troubleshooting, and schema artifacts.
+- Validated the package with `npm run validate` before release prep.
+## 0.1.4 - 2026-06-19
+Prepared for npm publication.
+### Added
+- Added profile-based supervisor execution with `lean_work_unit_runner`, `strict_full_workflow`, and `planning_only`.
+- Added compact lean-runner ledger guidance for large bounded backlogs that need lower memory and less ceremony.
+- Added native worker resource lifecycle rules for thread and subagent transports.
+### Changed
+- Changed strict worker lifecycle from logical closeout only to `planned -> handed_off -> acknowledged -> reported -> verified -> resource_closed -> closed`.
+- Required native worker transports to record resource ids, close actions, and close results before final workflow outcome.
+- Made one-shot portable delegation the preferred worker path when it satisfies the work, because it avoids resident native workers.
+### Fixed
+- Prevented completed Codex subagents from remaining open after workflow-supervisor runs by requiring `close_agent` for every recorded native `agent_id`.
+- Blocked final PASS when any native worker has no recorded close result.
+- Reduced large-backlog memory pressure by defaulting lean execution to same-session phased work unless workers are explicitly authorized or risk escalation requires them.
+### Verified
+- Expanded lifecycle tests to cover profile selection, lean ledgers, native worker resource ids, `close_agent`, and close-result gates.
+## 0.1.3 - 2026-06-17
+Published to npm: 2026-06-17 22:09:08 UTC
+Commit: `154bbd7`
+### Added
+- Added resumable SPEC gate behavior so broad source-controlled workflows can pause for human review before final work units, dossiers, and implementation.
+- Added resume guidance for autonomous workflows that block on a human decision, including updates to workflow state, goal state, and decision artifacts.
+- Expanded troubleshooting guidance for broad roadmap scope, residual risks that hide required work, and SPEC review before work units.
+### Changed
+- Hardened workflow-supervisor scope coverage so material source requirements, roadmap phases, exit criteria, named systems, and numeric targets must be mapped to work units, explicitly deferred, blocked, or marked non-material.
+- Updated acceptance, loop-policy, work-unit, and workflow-docs instructions to preserve source requirement strength and avoid quiet downgrades.
+### Verified
+- Expanded workflow-supervisor lifecycle tests for source coverage, SPEC review, and resume behavior.
+## 0.1.2 - 2026-06-17
+Published to npm: 2026-06-17 16:00:10 UTC
+Commit: `b449656`
+### Changed
+- Reworked the workflow-supervisor skill around a stricter worker-agent supervisor architecture.
+- Made explicit supervisor invocation require full intake, work units, dossiers, worker-agent contracts, scoped handoffs, report schema, and verification even for small tasks.
+- Clarified that implementation, verification, repair-authoring, and documentation are separate worker-agent responsibilities when an automated worker path is available.
+- Rewrote the README around the strict worker supervisor model and the current package workflow.
+### Verified
+- Added lifecycle coverage for strict supervisor invocation behavior.
+## 0.1.1 - 2026-06-15
+Published to npm: 2026-06-15 10:59:19 UTC
+Commit: `ee4c02b`
+### Added
+- Added portable worker delegation for Codex and Claude Code through `workflow-supervisor delegate`.
+- Added `WorkerReportV1` and `DossierV1` schema artifacts plus dossier validation before delegation.
+- Added `delegate-doctor` for adapter inspection and optional probe runs.
+- Added project-scope `.workflow/` ignore handling for local workflow state.
+- Added portable delegation documentation and tests for install, delegation, and lifecycle behavior.
+### Changed
+- Renamed the primary package executable path around `workflow-supervisor` while keeping `workflow-skills` as an executable alias.
+- Narrowed certified install/delegation targets to Codex, Claude Code, and generic Markdown contexts.
+- Strengthened validation to include adapter metadata and schema artifacts.
+### Verified
+- Added Node test coverage for delegate CLI behavior, installation behavior, portable delegation, and supervisor lifecycle handling.
+## 0.1.0 - 2026-06-14
+Published to npm: 2026-06-14 23:35:57 UTC
+Source: npm tarball contents. The GitHub release tag for this version is a reconstructed source snapshot from the npm tarball because no exact matching commit exists in the branch history for this first publish.
+### Added
+- Initial npm package for the workflow-supervisor skill pack.
+- Added the bundled skills: `workflow-supervisor`, `worker-roles`, `acceptance-matrix`, `dossier-builder`, `source-corpus`, `loop-policy`, `work-unit`, and `workflow-docs`.
+- Added the `workflow-supervisor` and `workflow-skills` executables for listing, validating, installing, uninstalling, and emitting portable context.
+- Added Codex, Claude Code, OpenCode, HermesAgent, and generic adapter metadata, plus package documentation, troubleshooting notes, compatibility notes, and a README overview.
+- Added packaging metadata, test coverage, and prepublish validation through `npm run validate`.
+### Verified
+- Initial package validation covered skill folder structure, `SKILL.md` metadata, and publishable package layout.

package/README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 # Workflow Supervisor
-Workflow Supervisor is a strict supervision skill pack for agent work that needs to stay organized, resumable, and evidence-backed.
+Workflow Supervisor is a profile-based supervision skill pack for agent work that needs to stay organized, resumable, evidence-backed, and proportional to the work.
-It is for moments when you do not want an agent to jump straight into implementation, lose the thread halfway through, verify its own work, or quietly skip important handoffs. You ask for the supervisor, the supervisor asks the right setup questions, turns the work into small units, creates worker dossiers, delegates scoped work to real worker agents when possible, verifies the results, and leaves a clear outcome trail.
+It is for moments when you do not want an agent to lose the thread halfway through, quietly skip scope, or turn a large backlog into an unreviewable blur. You ask for the supervisor, the supervisor selects the right execution profile, keeps the work units explicit, verifies results with evidence, and leaves a clear outcome trail. Heavy multi-agent ceremony is available when risk justifies it; large pure backlogs can use a lean runner that keeps the agent focused on delivery.
 Example prompt:
@@ -10,19 +10,37 @@ Example prompt:
 Use $workflow-supervisor to build a FastAPI Naive RAG demo for a healthcare specialist agent.
 ```
-The correct first response is not code. The correct first response is an intake packet. That is intentional.
+When you explicitly ask for Workflow Supervisor, the correct first response is not code. The correct first response is an intake packet. That is intentional.
 ![Workflow Supervisor hero image showing the supervisor coordinating source corpus, work units, dossiers, roles, loop policy, acceptance, repair, workflow docs, and final outputs](assets/workflow-supervisor-hero.png)
+## Route First
+Use the smallest route that fits the work before choosing a profile.
+| Situation | Route |
+|---|---|
+| Small, clear edit with obvious files and acceptance | Do not use Workflow Supervisor. Execute directly. |
+| Large bounded backlog with clear unit done signals | `lean_work_unit_runner`. |
+| Broad, ambiguous, source-of-truth, delegated, security-sensitive, dirty-state, release, resume, or externally published work | `strict_full_workflow`. |
+| Sequencing, risk review, or backlog shaping only | `planning_only`. |
+| Runnable uncertainty before implementation | Create a discovery or prototype unit first. |
+This route check matters most when Workflow Supervisor was not explicitly invoked. If the task is a tiny direct edit, normal repo work is better than starting a supervisor loop.
+If the user explicitly invokes `workflow-supervisor`, `$workflow-supervisor`, or says to use the skill, do not silently skip it. Select the proportional profile, then keep the overhead as small as that profile allows.
 ## What You Get
 Workflow Supervisor gives you a repeatable workflow for serious agent tasks:
 - a complete intake before work starts
+- a profile choice between `lean_work_unit_runner`, `strict_full_workflow`, and `planning_only`
 - a source map, even when the only source is the user prompt
 - a source-requirement coverage ledger so roadmap items and exit criteria cannot disappear
 - a `SPEC.md` review gate where humans can ask questions, request revisions, block, defer, or approve before work units are finalized
 - bounded work units, including `WU-001` for tiny tasks
+- a compact ledger for high-throughput work-unit execution
 - dossiers that tell each worker exactly what to do and what not to touch
 - separate implementer, verifier, repair, and documenter responsibilities
 - structured worker reports instead of loose prose
@@ -31,7 +49,7 @@ Workflow Supervisor gives you a repeatable workflow for serious agent tasks:
 - durable `.workflow/` state when the work needs to survive context loss
 - a final report with checks, risks, workers, and next actions
-The main design choice is simple: the supervisor coordinates, workers do scoped work, and the CLI stays small.
+The main design choice is simple: supervision is mandatory when requested, but overhead is profile-dependent. Work units preserve clarity. Workers, dossiers, and independent verifier loops are tools for strict or escalated work, not a default tax on every unit.
 ## The Mental Model
@@ -88,9 +106,35 @@ flowchart TB
 ## What Happens When You Invoke It
-When you explicitly invoke `workflow-supervisor`, `$workflow-supervisor`, or say to use the skill, the workflow enters `strict_full_workflow`.
+When you explicitly invoke `workflow-supervisor`, `$workflow-supervisor`, or say to use the skill, the first supervisor decision is the execution profile:
-Strict mode means task size does not matter. Even if the request is "make a function that adds two numbers", explicit supervisor invocation still means the full workflow:
+- `lean_work_unit_runner`: for large, already-bounded work-unit backlogs where throughput and low memory matter.
+- `strict_full_workflow`: for ambiguous, high-risk, delegated, security-sensitive, source-of-truth, publication, or cross-system work.
+- `planning_only`: for intake, sequencing, risk review, and recommendations without implementation.
+Lean mode keeps work units but removes per-unit ceremony. It uses one upfront scope contract, one compact ledger, targeted checks, and strict escalation gates:
+```text
+select next ready unit
+-> inspect only needed sources
+-> patch or update the allowed surface
+-> run the targeted check
+-> update one ledger row
+-> continue until batch checkpoint, blocker, or final disposition
+```
+A lean unit is not ready unless it has:
+```yaml
+id:
+source_ref:
+scope:
+done:
+check:
+status:
+```
+Strict mode is still available when risk justifies it. In strict mode, task size does not matter. The full workflow is:
 1. Ask the complete intake packet.
 2. Build or record the source corpus.
@@ -108,20 +152,21 @@ Strict mode means task size does not matter. Even if the request is "make a func
 14. Refresh docs or outcome state.
 15. Report final status and next action.
-This rule exists to prevent the agent from deciding that a task is "too simple" and quietly skipping the supervisor.
+Profile selection exists to prevent both failure modes: skipping supervision when work is risky, and drowning simple or already-bounded work in process.
 ## Intake
-The supervisor must get explicit answers to these seven items before planning deeply, creating a goal, delegating workers, implementing, publishing, or taking irreversible action:
+The supervisor must get explicit answers to these eight items before planning deeply, creating a goal, delegating workers, implementing, publishing, or taking irreversible action:
 ```text
 1. Objective and source: what artifact, spec, repo path, document, ticket, or source set controls the work?
-2. Execution path: autonomous_goal or human_in_loop?
-3. Mode: sequential, parallel where safe, or staged parallel?
-4. Delegation: automated worker delegation, native threads/subagents if available, or same-session phased?
-5. Final disposition: keep local, open PR, push main, deploy/publish, or ask at the end?
-6. Boundaries: may I install dependencies, call external services, use credentials, or only edit local files?
-7. State artifacts: create .workflow docs, use another artifact directory, or keep state inline?
+2. Profile: lean_work_unit_runner, strict_full_workflow, or planning_only?
+3. Execution path: autonomous_goal or human_in_loop?
+4. Mode: sequential, parallel where safe, or staged parallel?
+5. Delegation: same-session phased, automated worker delegation, or native threads/subagents if available?
+6. Final disposition: keep local, open PR, push main, deploy/publish, or ask at the end?
+7. Boundaries: may I install dependencies, call external services, use credentials, or only edit local files?
+8. State artifacts: compact ledger, .workflow docs, another artifact directory, or inline state?
 ```
 If any answer is missing or vague, the supervisor asks only for the missing pieces and stops. Phrases like "work autonomously", "just do it", or "use your judgment" do not fill in the missing intake fields.
@@ -156,10 +201,10 @@ complete intake
 The worker lifecycle is tracked separately:
 ```text
-planned -> handed_off -> acknowledged -> reported -> verified -> closed
+planned -> handed_off -> acknowledged -> reported -> verified -> resource_closed -> closed
 ```
-This makes it possible to see where the workflow is, which worker owns which piece, what evidence exists, and what should happen next.
+This makes it possible to see where the workflow is, which worker owns which piece, what evidence exists, what native resource was opened, and whether that resource was closed. A native worker is not closed just because it returned a report.
 For source-of-truth builds, the coverage ledger is the guardrail against "green but incomplete" outcomes. Every material source requirement must be mapped to a work unit and acceptance row, explicitly deferred by the user, blocked as a scope decision, or marked non-material with a reason. Residual risks and future-work notes cannot contain unimplemented material source requirements in a PASS workflow.
@@ -199,9 +244,9 @@ Common workflow files:
 | `.workflow/SPEC.md` | `workflow-supervisor`, `source-corpus`, `workflow-docs` | Human-reviewable interpretation, requirement coverage, Q&A, and approval decision before work units. |
 | `.workflow/WORK-UNITS.md` | `work-unit`, `workflow-docs` | Unit list, dependencies, sequencing, blocked units. |
 | `.workflow/DOSSIER.md` or `.workflow/dossiers/*.yaml` | `dossier-builder`, `workflow-docs` | Worker contracts for implementation, verification, repair, or documentation. |
-| `.workflow/WORKER-MAP.md` | `workflow-supervisor`, `worker-roles`, `workflow-docs` | Worker names, roles, transports, lifecycle, reports, blockers. |
-| `.workflow/ACCEPTANCE-MATRIX.md` | `acceptance-matrix`, `workflow-docs` | Evidence rows and material PASS, FAIL, BLOCKED states. |
-| `.workflow/VERIFICATION-REPORT.md` | verifier worker, `acceptance-matrix`, `workflow-docs` | Verification evidence, findings, skipped checks, residual risks. |
+| `.workflow/WORKER-MAP.md` | `workflow-supervisor`, `worker-roles`, `workflow-docs` | Worker names, roles, transports, native resource ids, lifecycle, reports, close results, blockers. |
+| `.workflow/ACCEPTANCE-MATRIX.md` | `acceptance-matrix`, `workflow-docs` | Evidence rows, outcome evaluation rows, capability limits, and material PASS, FAIL, BLOCKED states. |
+| `.workflow/VERIFICATION-REPORT.md` | verifier worker, `acceptance-matrix`, `workflow-docs` | Verification environment, outcome evidence, findings, skipped checks, residual risks. |
 | `.workflow/REPAIR-TICKETS.md` | repair worker, `workflow-docs` | Repair tasks tied to failed rows or verifier findings. |
 | `.workflow/DECISIONS.md` | supervisor, `workflow-docs` | User decisions, assumptions, reversals, unresolved questions. |
 | `.workflow/HANDOFF.md` | supervisor, `workflow-docs` | Resume pack for another agent or later session. |
@@ -305,6 +350,33 @@ Every delegated worker returns this machine-shaped report:
   "findings": [],
   "blocking_question": null,
   "next_action": "supervisor_review",
+  "verification_environment": {
+    "shell": true,
+    "filesystem": true,
+    "git_diff": true,
+    "browser": false,
+    "playwright_mcp": false,
+    "network": false,
+    "capabilities": ["shell_command", "api_probe", "static_diff_inspection"],
+    "limitations": ["Browser layout was not verified because browser capability was unavailable"]
+  },
+  "outcome_evaluations": [
+    {
+      "id": "A-001",
+      "source_requirement": "User can create a workflow and read it back.",
+      "expected_outcome": "The API persists the workflow and returns it with the expected schema.",
+      "preferred_verification": ["integration_test", "api_probe", "static_diff_inspection"],
+      "available_verification": ["integration_test", "api_probe", "static_diff_inspection"],
+      "evidence_strength": {
+        "strongest_possible": ["integration_test"],
+        "strongest_available": ["integration_test"]
+      },
+      "evidence": ["pytest tests/test_api.py passed"],
+      "invalid_pass_conditions": ["tests only without API behavior assertion", "hardcoded fixture"],
+      "verdict": "PASS",
+      "finding": ""
+    }
+  ],
   "adapter": null,
   "guard": null,
   "reason": null
@@ -313,9 +385,30 @@ Every delegated worker returns this machine-shaped report:
 The supervisor trusts the report shape, not loose prose. A PASS without evidence is invalid. A verifier that edits implementation is invalid. A worker that asks the human directly is converted into a blocker for the supervisor to route.
+Outcome verification treats the implementer report as a claim. A material behavior row needs expected outcome, evidence strength, preferred and available verification capabilities, invalid PASS conditions, and row-mapped evidence. `CONDITIONAL_PASS` is allowed only as a row-level outcome verdict for behavior that is strongly inferred but not fully observable; it is not a top-level `WorkerReportV1.status` and must not be treated as final green status without explicit waiver evidence.
+### Outcome Evaluation
+The implementer report is not the proof of the outcome. It is a structured claim that gives the supervisor something concrete to verify.
+For outcome-bearing work, the verifier maps each material source requirement to:
+- the expected user or system-visible outcome
+- the strongest preferred verification capability
+- the strongest verification capability actually available
+- row-mapped evidence
+- invalid PASS conditions
+- a row verdict
+Valid row verdicts are `PASS`, `FAIL`, `BLOCKED`, and `CONDITIONAL_PASS`. A row can be `CONDITIONAL_PASS` only when the available evidence strongly supports the outcome but a stronger material capability is unavailable. The missing capability and required external check must be recorded.
+The final worker status remains narrower: `PASS`, `FAIL`, or `BLOCKED`. A final `PASS` requires every material outcome row to be `PASS` unless the user explicitly waives or narrows the missing proof.
+For native threads or subagents, the report is only the work result. The supervisor must also close the native resource. For Codex subagents, record the returned `agent_id` and call `close_agent` after the report, timeout, failure, blocker, cancellation, or invalid-output result is captured. Final outcome is blocked while any native worker lacks a close result.
 ## How The Supervisor Talks To Workers
-The portable worker path is one CLI command:
+The portable worker path is one CLI command and is preferred when it satisfies the work because it is one-shot:
 ```bash
 workflow-supervisor delegate \
@@ -351,9 +444,11 @@ workflow-supervisor delegate-doctor --agent all --probe --require-pass
 If a worker adapter is missing, unauthenticated, times out, returns invalid output, edits forbidden surfaces, or returns PASS without evidence, the delegate command returns a structured `BLOCKED` report.
+Native thread or subagent transports may be used only when the environment exposes the full lifecycle: create, wait or receive a terminal report, and close. If a native transport can start workers but cannot close them, the supervisor records `worker_resource_close_unavailable` and uses portable delegation or same-session phased work only when intake allows it.
 ## No Silent Fallbacks
-If the environment can create, message, or delegate to worker agents, the supervisor must use real workers for implementation, verification, repair, and documentation responsibilities.
+In `strict_full_workflow` with worker delegation selected, if the environment can create, message, or delegate to worker agents, the supervisor must use real workers for implementation, verification, repair, and documentation responsibilities.
 If it cannot, it must record:
@@ -363,7 +458,7 @@ worker_agent_unavailable
 Then it must stop for a human decision unless complete intake explicitly selected `same_session_phased`.
-Same-session phased work is allowed only when selected. Verification in that mode is a `self-check`, not an `independent-verifier`.
+In `lean_work_unit_runner`, same-session phased execution is the default unless the user explicitly authorizes workers for a batch or escalation. Verification in same-session mode is a `focused-check` or `self-check`, not an `independent-verifier`.
 ## Install
@@ -538,13 +633,15 @@ If you are an agent using this package:
 1. Do not start work before complete intake.
 2. Do not infer missing permissions from words like "autonomous", "generate", or "work until done".
-3. If `$workflow-supervisor` is explicit, always create at least one work unit.
-4. Do not delegate without a valid `DossierV1`.
-5. Use separate worker agents when supported by the environment.
-6. Do not silently collapse worker agents into same-session roleplay.
-7. Treat same-session verification as `self-check`, not `independent-verifier`.
+3. If `$workflow-supervisor` is explicit, select `lean_work_unit_runner`, `strict_full_workflow`, or `planning_only` before heavy planning.
+4. Always keep work units explicit; lean mode uses a compact ledger instead of full per-unit ceremony.
+5. Do not delegate without a valid `DossierV1`.
+6. Use separate worker agents in strict or explicitly delegated work, not by default for lean execution.
+7. Treat same-session verification as `focused-check` or `self-check`, not `independent-verifier`.
 8. Trust only structured `WorkerReportV1` results from delegated workers.
 9. Treat verifier edits as invalid.
-10. Keep `.workflow/` ignored and local unless the user explicitly asks to publish it.
+10. Treat tests/typecheck/build as evidence types, not automatic outcome proof.
+11. Record capability limits instead of pretending browser, visual, live-service, credential, network, or human-review proof exists.
+12. Keep `.workflow/` ignored and local unless the user explicitly asks to publish it.
 The promise is not magic autonomy. The promise is disciplined supervision: clear setup, bounded work, scoped workers, structured reports, evidence, repair, and a clean final handoff.