npm - @exaudeus/workrail - Versions diffs - 3.2.1 → 3.4.0 - Mend

@exaudeus/workrail 3.2.1 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

package/dist/application/services/compiler/binding-registry.d.ts +3 -0
package/dist/application/services/compiler/binding-registry.js +71 -0
package/dist/application/services/compiler/resolve-bindings.d.ts +18 -0
package/dist/application/services/compiler/resolve-bindings.js +162 -0
package/dist/application/services/compiler/sentinel-scan.d.ts +9 -0
package/dist/application/services/compiler/sentinel-scan.js +37 -0
package/dist/application/services/validation-engine.js +104 -0
package/dist/application/services/workflow-compiler.d.ts +10 -2
package/dist/application/services/workflow-compiler.js +25 -6
package/dist/application/services/workflow-validation-pipeline.js +8 -1
package/dist/cli.js +2 -2
package/dist/config/feature-flags.js +4 -3
package/dist/engine/engine-factory.js +1 -1
package/dist/index.d.ts +2 -1
package/dist/index.js +4 -2
package/dist/manifest.json +151 -103
package/dist/mcp/handler-factory.d.ts +1 -1
package/dist/mcp/handler-factory.js +2 -2
package/dist/mcp/handlers/v2-checkpoint.js +5 -5
package/dist/mcp/handlers/v2-error-mapping.js +4 -4
package/dist/mcp/handlers/v2-execution/continue-advance.js +2 -2
package/dist/mcp/handlers/v2-execution/continue-rehydrate.d.ts +1 -0
package/dist/mcp/handlers/v2-execution/continue-rehydrate.js +76 -60
package/dist/mcp/handlers/v2-execution/index.js +86 -44
package/dist/mcp/handlers/v2-execution-helpers.js +1 -1
package/dist/mcp/handlers/v2-resume.js +10 -5
package/dist/mcp/handlers/v2-token-ops.d.ts +1 -1
package/dist/mcp/handlers/v2-token-ops.js +5 -5
package/dist/mcp/handlers/v2-workspace-resolution.d.ts +1 -0
package/dist/mcp/handlers/v2-workspace-resolution.js +12 -0
package/dist/mcp/index.d.ts +4 -1
package/dist/mcp/index.js +6 -2
package/dist/mcp/output-schemas.d.ts +148 -8
package/dist/mcp/output-schemas.js +22 -4
package/dist/mcp/server.d.ts +6 -4
package/dist/mcp/server.js +2 -57
package/dist/mcp/tool-descriptions.js +9 -158
package/dist/mcp/transports/http-entry.js +6 -25
package/dist/mcp/transports/shutdown-hooks.d.ts +5 -0
package/dist/mcp/transports/shutdown-hooks.js +38 -0
package/dist/mcp/transports/stdio-entry.js +6 -28
package/dist/mcp/v2/tool-registry.js +2 -1
package/dist/mcp/v2/tools.d.ts +28 -11
package/dist/mcp/v2/tools.js +28 -4
package/dist/mcp/v2-response-formatter.js +28 -1
package/dist/mcp/validation/suggestion-generator.d.ts +1 -1
package/dist/mcp/validation/suggestion-generator.js +13 -3
package/dist/mcp/workflow-protocol-contracts.d.ts +31 -0
package/dist/mcp/workflow-protocol-contracts.js +207 -0
package/dist/mcp-server.d.ts +3 -1
package/dist/mcp-server.js +6 -2
package/dist/types/workflow-definition.d.ts +7 -0
package/dist/types/workflow-definition.js +1 -0
package/dist/v2/durable-core/domain/binding-drift.d.ts +8 -0
package/dist/v2/durable-core/domain/binding-drift.js +29 -0
package/dist/v2/durable-core/domain/reason-model.js +2 -2
package/dist/v2/durable-core/schemas/compiled-workflow/index.d.ts +12 -0
package/dist/v2/durable-core/schemas/compiled-workflow/index.js +2 -0
package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +56 -56
package/dist/v2/durable-core/schemas/session/events.d.ts +16 -16
package/dist/v2/durable-core/schemas/session/gaps.d.ts +6 -6
package/dist/v2/projections/resume-ranking.d.ts +1 -0
package/dist/v2/projections/resume-ranking.js +1 -0
package/dist/v2/read-only/v1-to-v2-shim.js +27 -10
package/dist/v2/usecases/resume-session.d.ts +5 -1
package/dist/v2/usecases/resume-session.js +4 -1
package/package.json +1 -1
package/spec/workflow.schema.json +44 -0
package/workflows/coding-task-workflow-agentic.json +15 -15
package/workflows/coding-task-workflow-agentic.lean.v2.json +10 -10
package/workflows/coding-task-workflow-agentic.v2.json +12 -12
package/workflows/coding-task-workflow-with-loops.json +2 -2
package/workflows/document-creation-workflow.json +1 -1
package/workflows/exploration-workflow.json +3 -3
package/workflows/mr-review-workflow.agentic.v2.json +11 -11

package/workflows/coding-task-workflow-agentic.v2.json CHANGED Viewed

@@ -41,7 +41,7 @@
     {
       "id": "phase-0-triage-and-mode",
       "title": "Phase 0: Triage (Complexity • Risk • PR Strategy)",
-      "prompt": "Analyze the task and choose the right rigor.\n\nClassify:\n- `taskComplexity`: Small / Medium / Large\n- `riskLevel`: Low / Medium / High\n- `rigorMode`: QUICK / STANDARD / THOROUGH\n- `automationLevel`: High / Medium / Low\n- `prStrategy`: SinglePR / MultiPR\n- `maxParallelism`: 0 / 3 / 4\n\nDecision guidance:\n- QUICK: small, low-risk, clear path, little ambiguity\n- STANDARD: medium scope or moderate risk\n- THOROUGH: large scope, architectural uncertainty, or high-risk change\n\nParallelism guidance:\n- QUICK: no delegation by default\n- STANDARD: few delegation moments, but allow multiple parallel executors at each moment\n- THOROUGH: same pattern, but with one extra delegation moment and broader parallel validation\n\nAlso capture `userRules` from the active session instructions and explicit philosophy. Keep them as a focused list of concrete, actionable rules.\n\nSet context variables: `taskComplexity`, `riskLevel`, `rigorMode`, `automationLevel`, `prStrategy`, `maxParallelism`, `userRules`.\n\nAsk the user to confirm only if the rigor or PR strategy materially affects delivery expectations.",
+      "prompt": "Analyze the task and choose the right rigor.\n\nClassify:\n- `taskComplexity`: Small / Medium / Large\n- `riskLevel`: Low / Medium / High\n- `rigorMode`: QUICK / STANDARD / THOROUGH\n- `automationLevel`: High / Medium / Low\n- `prStrategy`: SinglePR / MultiPR\n- `maxParallelism`: 0 / 3 / 4\n\nDecision guidance:\n- QUICK: small, low-risk, clear path, little ambiguity\n- STANDARD: medium scope or moderate risk\n- THOROUGH: large scope, architectural uncertainty, or high-risk change\n\nParallelism guidance:\n- QUICK: no delegation by default\n- STANDARD: few delegation moments, but allow multiple parallel executors at each moment\n- THOROUGH: same pattern, but with one extra delegation moment and broader parallel validation\n\nAlso capture `userRules` from the active session instructions and explicit philosophy. Keep them as a focused list of concrete, actionable rules.\n\nSet these keys in the next `continue_workflow` call's `context` object: `taskComplexity`, `riskLevel`, `rigorMode`, `automationLevel`, `prStrategy`, `maxParallelism`, `userRules`.\n\nAsk the user to confirm only if the rigor or PR strategy materially affects delivery expectations.",
       "requireConfirmation": true
     },
     {
@@ -62,7 +62,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Build the minimum complete understanding needed to design correctly.\n\nDo the main context gathering yourself using tools. Read independent files in parallel when possible.\n\nDeliverable:\n- key entry points and call chain sketch\n- relevant files/modules/functions\n- existing repo patterns with concrete file references\n- testing strategy already present in the repo\n- risks and unknowns\n- explicit invariants and non-goals\n\nSet context variables:\n- `contextSummary`\n- `candidateFiles`\n- `invariants`\n- `nonGoals`\n- `openQuestions`\n- `contextUnknownCount`\n- `contextAuditNeeded`\n\nRules:\n- answer your own questions with tools whenever possible\n- only keep true human-decision questions in `openQuestions`\n- keep `openQuestions` bounded to the minimum necessary\n- set `contextUnknownCount` to the number of unresolved technical unknowns that still matter\n- set `contextAuditNeeded` to true if understanding still feels incomplete or the call chain is still too fuzzy\n\nMode-adaptive audit:\n- QUICK: no delegation; self-check only\n- STANDARD: if `contextAuditNeeded = true` or risk is High and delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step\n- THOROUGH: if delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step",
+      "prompt": "Build the minimum complete understanding needed to design correctly.\n\nDo the main context gathering yourself using tools. Read independent files in parallel when possible.\n\nDeliverable:\n- key entry points and call chain sketch\n- relevant files/modules/functions\n- existing repo patterns with concrete file references\n- testing strategy already present in the repo\n- risks and unknowns\n- explicit invariants and non-goals\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `contextSummary`\n- `candidateFiles`\n- `invariants`\n- `nonGoals`\n- `openQuestions`\n- `contextUnknownCount`\n- `contextAuditNeeded`\n\nRules:\n- answer your own questions with tools whenever possible\n- only keep true human-decision questions in `openQuestions`\n- keep `openQuestions` bounded to the minimum necessary\n- set `contextUnknownCount` to the number of unresolved technical unknowns that still matter\n- set `contextAuditNeeded` to true if understanding still feels incomplete or the call chain is still too fuzzy\n\nMode-adaptive audit:\n- QUICK: no delegation; self-check only\n- STANDARD: if `contextAuditNeeded = true` or risk is High and delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step\n- THOROUGH: if delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step",
       "requireConfirmation": false
     },
     {
@@ -72,7 +72,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Reassess the initial triage now that context is real instead of hypothetical.\n\nReview:\n- `contextUnknownCount`\n- number of systems/components actually involved\n- number of critical invariants discovered\n- whether the task now looks broader or riskier than Phase 0 suggested\n\nDo:\n- confirm or adjust `taskComplexity`\n- confirm or adjust `riskLevel`\n- confirm or adjust `rigorMode`\n- confirm or adjust `maxParallelism`\n\nRules:\n- upgrade rigor if the real architecture surface or uncertainty is larger than expected\n- downgrade only if the task is genuinely simpler than it first appeared\n- if you change the mode, explain why using concrete evidence from the context gathering step\n\nSet context variables:\n- `taskComplexity`\n- `riskLevel`\n- `rigorMode`\n- `maxParallelism`\n- `retriageChanged` (true/false)",
+      "prompt": "Reassess the initial triage now that context is real instead of hypothetical.\n\nReview:\n- `contextUnknownCount`\n- number of systems/components actually involved\n- number of critical invariants discovered\n- whether the task now looks broader or riskier than Phase 0 suggested\n\nDo:\n- confirm or adjust `taskComplexity`\n- confirm or adjust `riskLevel`\n- confirm or adjust `rigorMode`\n- confirm or adjust `maxParallelism`\n\nRules:\n- upgrade rigor if the real architecture surface or uncertainty is larger than expected\n- downgrade only if the task is genuinely simpler than it first appeared\n- if you change the mode, explain why using concrete evidence from the context gathering step\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `taskComplexity`\n- `riskLevel`\n- `rigorMode`\n- `maxParallelism`\n- `retriageChanged` (true/false)",
       "requireConfirmation": {
         "or": [
           { "var": "retriageChanged", "equals": true },
@@ -87,7 +87,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Make the architecture decision in one coherent phase instead of serializing every thinking mode into a separate step.\n\nPart A — Prepare a neutral fact packet:\n- problem statement\n- acceptance criteria\n- non-goals\n- invariants\n- constraints\n- `userRules`\n- relevant files / pattern examples\n- current risks and unknowns\n\nPart B — Generate candidate plans:\n- QUICK: self-generate at least 3 genuinely different approaches\n- STANDARD: if delegation is available, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, pragmatic)\n- THOROUGH: if delegation is available, spawn THREE or FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, architecture-first, rollback-safe)\n\nPart C — Diversity gate before commitment:\n- assign each candidate plan a short `candidatePlanFamily` label\n- check whether the candidates are materially different in shape, not just wording\n- if all candidates cluster on the same pattern family, generate at least one more plan from a deliberately different perspective before selecting\n- set `candidateDiversityAdequate = true|false`\n\nPart D — Compare candidate plans:\n- invariant fit\n- philosophy alignment (`userRules` as active lens)\n- risk profile\n- implementation shape\n- likely reviewability / PR shape\n\nPart E — Challenge the best one or two:\n- STANDARD: optionally challenge the leading candidate with ONE WorkRail Executor running `routine-hypothesis-challenge`\n- THOROUGH: challenge the top 1-2 candidate plans using ONE or TWO WorkRail Executors running `routine-hypothesis-challenge`\n\nPart F — Decide:\nSet context variables:\n- `approaches`\n- `alternativesConsideredCount`\n- `candidatePlanFamilies`\n- `candidateDiversityAdequate`\n- `hasRunnerUp`\n- `selectedApproach`\n- `runnerUpApproach`\n- `architectureRationale`\n- `keyRiskToMonitor`\n- `pivotTriggers`\n- `architectureConfidenceBand`\n\nRules:\n- the main agent owns the final decision\n- subagents generate candidate plans; they do not decide the winner\n- if the challenged leading candidate no longer looks best, switch deliberately rather than defending sunk cost",
+      "prompt": "Make the architecture decision in one coherent phase instead of serializing every thinking mode into a separate step.\n\nPart A — Prepare a neutral fact packet:\n- problem statement\n- acceptance criteria\n- non-goals\n- invariants\n- constraints\n- `userRules`\n- relevant files / pattern examples\n- current risks and unknowns\n\nPart B — Generate candidate plans:\n- QUICK: self-generate at least 3 genuinely different approaches\n- STANDARD: if delegation is available, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, pragmatic)\n- THOROUGH: if delegation is available, spawn THREE or FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, architecture-first, rollback-safe)\n\nPart C — Diversity gate before commitment:\n- assign each candidate plan a short `candidatePlanFamily` label\n- check whether the candidates are materially different in shape, not just wording\n- if all candidates cluster on the same pattern family, generate at least one more plan from a deliberately different perspective before selecting\n- set `candidateDiversityAdequate = true|false`\n\nPart D — Compare candidate plans:\n- invariant fit\n- philosophy alignment (`userRules` as active lens)\n- risk profile\n- implementation shape\n- likely reviewability / PR shape\n\nPart E — Challenge the best one or two:\n- STANDARD: optionally challenge the leading candidate with ONE WorkRail Executor running `routine-hypothesis-challenge`\n- THOROUGH: challenge the top 1-2 candidate plans using ONE or TWO WorkRail Executors running `routine-hypothesis-challenge`\n\nPart F — Decide:\nSet these keys in the next `continue_workflow` call's `context` object:\n- `approaches`\n- `alternativesConsideredCount`\n- `candidatePlanFamilies`\n- `candidateDiversityAdequate`\n- `hasRunnerUp`\n- `selectedApproach`\n- `runnerUpApproach`\n- `architectureRationale`\n- `keyRiskToMonitor`\n- `pivotTriggers`\n- `architectureConfidenceBand`\n\nRules:\n- the main agent owns the final decision\n- subagents generate candidate plans; they do not decide the winner\n- if the challenged leading candidate no longer looks best, switch deliberately rather than defending sunk cost",
       "requireConfirmation": {
         "or": [
           { "var": "automationLevel", "equals": "Low" },
@@ -103,7 +103,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Create or update the human-facing implementation artifact: `implementation_plan.md`.\n\nThis phase combines slicing, plan drafting, philosophy alignment, and test design.\n\nThe plan must include:\n1. Problem statement\n2. Acceptance criteria\n3. Non-goals\n4. Applied `userRules` and philosophy-driven constraints\n5. Invariants\n6. Selected approach + rationale + runner-up\n7. Vertical slices\n8. Work packages only when they improve execution or enable safe parallelism\n9. Test design\n10. Risk register\n11. PR packaging strategy\n12. Philosophy alignment per slice:\n   - [principle] → [satisfied / tension / violated + 1-line why]\n\nSet context variables:\n- `implementationPlan`\n- `slices`\n- `testDesign`\n- `estimatedPRCount`\n- `followUpTickets` (initialize if needed)\n- `unresolvedUnknownCount`\n- `planConfidenceBand`\n\nRules:\n- keep `implementation_plan.md` concrete enough for another engineer to implement without guessing\n- use work packages only when they create real clarity; do not over-fragment work\n- use the user's coding philosophy as the primary planning lens, and name tensions explicitly\n- set `unresolvedUnknownCount` to the number of still-open issues that would materially affect implementation quality\n- set `planConfidenceBand` to Low / Medium / High based on how ready the plan actually is",
+      "prompt": "Create or update the human-facing implementation artifact: `implementation_plan.md`.\n\nThis phase combines slicing, plan drafting, philosophy alignment, and test design.\n\nThe plan must include:\n1. Problem statement\n2. Acceptance criteria\n3. Non-goals\n4. Applied `userRules` and philosophy-driven constraints\n5. Invariants\n6. Selected approach + rationale + runner-up\n7. Vertical slices\n8. Work packages only when they improve execution or enable safe parallelism\n9. Test design\n10. Risk register\n11. PR packaging strategy\n12. Philosophy alignment per slice:\n   - [principle] → [satisfied / tension / violated + 1-line why]\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `implementationPlan`\n- `slices`\n- `testDesign`\n- `estimatedPRCount`\n- `followUpTickets` (initialize if needed)\n- `unresolvedUnknownCount`\n- `planConfidenceBand`\n\nRules:\n- keep `implementation_plan.md` concrete enough for another engineer to implement without guessing\n- use work packages only when they create real clarity; do not over-fragment work\n- use the user's coding philosophy as the primary planning lens, and name tensions explicitly\n- set `unresolvedUnknownCount` to the number of still-open issues that would materially affect implementation quality\n- set `planConfidenceBand` to Low / Medium / High based on how ready the plan actually is",
       "requireConfirmation": false
     },
     {
@@ -127,13 +127,13 @@
         {
           "id": "phase-4a-plan-audit",
           "title": "Plan Audit (Correctness + Philosophy + Regression)",
-          "prompt": "Audit the plan before implementation.\n\nAlways perform:\n- completeness / missing work\n- weak assumptions and risks\n- invariant coverage\n- slice boundary quality\n- philosophy alignment against `userRules`\n- regression check against `resolvedFindings` (if present)\n\nPhilosophy rules:\n- flag findings by principle name\n- Red / Orange findings go into `planFindings`\n- Yellow tensions are informational only and do NOT block loop exit\n- if no philosophy principles are configured, say so explicitly and continue\n\nRegression check:\n- if `resolvedFindings` is non-empty, verify previous resolutions still hold\n- if a previously resolved issue has reappeared, add a Critical regression finding\n\nMode-adaptive delegation:\n- QUICK: self-audit only\n- STANDARD: if delegation is available, spawn THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, and `routine-philosophy-alignment`; include `routine-execution-simulation` only when runtime or state-flow risk is material\n- THOROUGH: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, `routine-execution-simulation`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ auditors flag the same issue, treat it as high priority by default\n- if one auditor flags a concern no one else sees, investigate it but do not automatically block unless it is clearly severe\n- if outputs conflict, document the conflict explicitly and resolve it before finalizing `planFindings`\n- if philosophy review yields Red findings and no stronger conflicting evidence exists, they must remain blocking\n\nSet context variables:\n- `planFindings`\n- `planAmendments`\n- `planConfidence`\n- `cleanSlateDivergence`\n- `planFindingsCountBySeverity`\n- `philosophyFindingsCountBySeverity`\n- `auditConsensusLevel`\n\nRules:\n- use the main agent as synthesizer and final decision-maker\n- do not delegate sequentially when the audit routines are independent",
+          "prompt": "Audit the plan before implementation.\n\nAlways perform:\n- completeness / missing work\n- weak assumptions and risks\n- invariant coverage\n- slice boundary quality\n- philosophy alignment against `userRules`\n- regression check against `resolvedFindings` (if present)\n\nPhilosophy rules:\n- flag findings by principle name\n- Red / Orange findings go into `planFindings`\n- Yellow tensions are informational only and do NOT block loop exit\n- if no philosophy principles are configured, say so explicitly and continue\n\nRegression check:\n- if `resolvedFindings` is non-empty, verify previous resolutions still hold\n- if a previously resolved issue has reappeared, add a Critical regression finding\n\nMode-adaptive delegation:\n- QUICK: self-audit only\n- STANDARD: if delegation is available, spawn THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, and `routine-philosophy-alignment`; include `routine-execution-simulation` only when runtime or state-flow risk is material\n- THOROUGH: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, `routine-execution-simulation`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ auditors flag the same issue, treat it as high priority by default\n- if one auditor flags a concern no one else sees, investigate it but do not automatically block unless it is clearly severe\n- if outputs conflict, document the conflict explicitly and resolve it before finalizing `planFindings`\n- if philosophy review yields Red findings and no stronger conflicting evidence exists, they must remain blocking\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `planFindings`\n- `planAmendments`\n- `planConfidence`\n- `cleanSlateDivergence`\n- `planFindingsCountBySeverity`\n- `philosophyFindingsCountBySeverity`\n- `auditConsensusLevel`\n\nRules:\n- use the main agent as synthesizer and final decision-maker\n- do not delegate sequentially when the audit routines are independent",
           "requireConfirmation": false
         },
         {
           "id": "phase-4b-refocus",
           "title": "Refocus Plan and Track Resolved Findings",
-          "prompt": "Apply plan amendments and refocus.\n\nDo:\n- update `implementation_plan.md` to incorporate `planAmendments`\n- update `slices` if the plan shape changed\n- extract out-of-scope work into `followUpTickets`\n- maintain `resolvedFindings` as an array of { finding, resolution, iteration }\n- cap `resolvedFindings` at 10 entries, dropping oldest first\n\nSet context variables:\n- `resolvedFindings`\n- `followUpTickets`\n\nRule:\n- do not silently accept plan drift; if the audit changed the shape of the work, reflect it in the plan artifact immediately",
+          "prompt": "Apply plan amendments and refocus.\n\nDo:\n- update `implementation_plan.md` to incorporate `planAmendments`\n- update `slices` if the plan shape changed\n- extract out-of-scope work into `followUpTickets`\n- maintain `resolvedFindings` as an array of { finding, resolution, iteration }\n- cap `resolvedFindings` at 10 entries, dropping oldest first\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `resolvedFindings`\n- `followUpTickets`\n\nRule:\n- do not silently accept plan drift; if the audit changed the shape of the work, reflect it in the plan artifact immediately",
           "requireConfirmation": false
         },
         {
@@ -154,7 +154,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Verify that planning is complete enough to start implementation.\n\nRequired checks:\n- selected approach and rationale exist\n- runner-up exists\n- pivot triggers are concrete enough to act on\n- slices are defined with scope, verification, and boundaries\n- `implementation_plan.md` reflects the current intended work\n- no unresolved planning gaps remain that would block implementation\n- `alternativesConsideredCount` shows real exploration happened\n\nSet context variables:\n- `planningGaps`\n- `planningComplete`\n\nRule:\n- if any gap can be fixed immediately, fix it now and do not carry it forward as a gap\n- only stop for user input when a true decision is missing",
+      "prompt": "Verify that planning is complete enough to start implementation.\n\nRequired checks:\n- selected approach and rationale exist\n- runner-up exists\n- pivot triggers are concrete enough to act on\n- slices are defined with scope, verification, and boundaries\n- `implementation_plan.md` reflects the current intended work\n- no unresolved planning gaps remain that would block implementation\n- `alternativesConsideredCount` shows real exploration happened\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `planningGaps`\n- `planningComplete`\n\nRule:\n- if any gap can be fixed immediately, fix it now and do not carry it forward as a gap\n- only stop for user input when a true decision is missing",
       "requireConfirmation": {
         "or": [
           { "var": "automationLevel", "equals": "Low" },
@@ -192,7 +192,7 @@
         {
           "id": "phase-7a-slice-preflight",
           "title": "Slice Preflight",
-          "prompt": "Before implementing slice `{{currentSlice.name}}`, verify:\n- pivot triggers have not fired\n- the plan assumptions are still fresh enough\n- target files and symbols still match the plan\n- the slice remains reviewable and bounded\n\nSet context variables:\n- `pivotTriggered`\n- `pivotSeverity`\n- `pivotReturnPhase`\n- `slicePlanStale`\n- `validationFailed`\n\nIf drift or invalid assumptions are discovered, stop and return to planning deliberately rather than coding through it.",
+          "prompt": "Before implementing slice `{{currentSlice.name}}`, verify:\n- pivot triggers have not fired\n- the plan assumptions are still fresh enough\n- target files and symbols still match the plan\n- the slice remains reviewable and bounded\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `pivotTriggered`\n- `pivotSeverity`\n- `pivotReturnPhase`\n- `slicePlanStale`\n- `validationFailed`\n\nIf drift or invalid assumptions are discovered, stop and return to planning deliberately rather than coding through it.",
           "requireConfirmation": {
             "or": [
               { "var": "pivotTriggered", "equals": true },
@@ -210,7 +210,7 @@
         {
           "id": "phase-7c-verify-slice",
           "title": "Verify Slice",
-          "prompt": "Verify slice `{{currentSlice.name}}`.\n\nAlways:\n- run planned verification commands\n- update or add tests when needed\n- ensure invariants still hold\n- check philosophy-alignment regressions introduced by the implementation\n\nFresh-eye validation triggers:\n- if `specialCaseIntroduced = true`\n- if `unplannedAbstractionIntroduced = true`\n- if this slice touched unexpected files\n- if runtime behavior still feels uncertain\n\nMode-adaptive validation:\n- QUICK: self-verify unless a fresh-eye trigger fires\n- STANDARD: if delegation is available and any fresh-eye trigger fires, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, and optionally `routine-philosophy-alignment`\n- THOROUGH + high-risk or any fresh-eye trigger: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, `routine-plan-analysis`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ validators independently raise the same serious concern, treat it as blocking by default\n- if exactly one validator raises a concern, attempt to understand and resolve it before escalating\n- if validators disagree, record the disagreement explicitly and prefer the safer path when uncertainty remains high\n\nSet context variables:\n- `sliceVerified`\n- `verificationFindings`\n- `verificationFailed`\n- `verificationApprovalRequired`\n- `verificationRetried`\n- `verificationConcernCount`\n- `verificationConsensusLevel`\n\nRule:\n- if 2+ independent validators raise serious concerns, stop and return to planning or ask the user which path to take",
+          "prompt": "Verify slice `{{currentSlice.name}}`.\n\nAlways:\n- run planned verification commands\n- update or add tests when needed\n- ensure invariants still hold\n- check philosophy-alignment regressions introduced by the implementation\n\nFresh-eye validation triggers:\n- if `specialCaseIntroduced = true`\n- if `unplannedAbstractionIntroduced = true`\n- if this slice touched unexpected files\n- if runtime behavior still feels uncertain\n\nMode-adaptive validation:\n- QUICK: self-verify unless a fresh-eye trigger fires\n- STANDARD: if delegation is available and any fresh-eye trigger fires, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, and optionally `routine-philosophy-alignment`\n- THOROUGH + high-risk or any fresh-eye trigger: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, `routine-plan-analysis`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ validators independently raise the same serious concern, treat it as blocking by default\n- if exactly one validator raises a concern, attempt to understand and resolve it before escalating\n- if validators disagree, record the disagreement explicitly and prefer the safer path when uncertainty remains high\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `sliceVerified`\n- `verificationFindings`\n- `verificationFailed`\n- `verificationApprovalRequired`\n- `verificationRetried`\n- `verificationConcernCount`\n- `verificationConsensusLevel`\n\nRule:\n- if 2+ independent validators raise serious concerns, stop and return to planning or ask the user which path to take",
           "requireConfirmation": {
             "or": [
               { "var": "verificationApprovalRequired", "equals": true },
@@ -221,7 +221,7 @@
         {
           "id": "phase-7d-drift-and-pr-gate",
           "title": "Drift and PR Gate",
-          "prompt": "After a verified slice:\n- compare actual changed scope against the slice plan\n- if the slice drifted, update `implementation_plan.md` immediately and record the reason in notes\n- if `prStrategy = MultiPR`, stop here with a concise PR package for user review before continuing\n\nSet context variables:\n- `planDrift`\n- `rulesDrift`\n- `changedFilesOutsidePlannedScope`\n- `scopeDriftDetected`\n\nRule:\n- do not rely on markdown sidecar state; notesMarkdown is the durable recap and `implementation_plan.md` is the human artifact",
+          "prompt": "After a verified slice:\n- compare actual changed scope against the slice plan\n- if the slice drifted, update `implementation_plan.md` immediately and record the reason in notes\n- if `prStrategy = MultiPR`, stop here with a concise PR package for user review before continuing\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `planDrift`\n- `rulesDrift`\n- `changedFilesOutsidePlannedScope`\n- `scopeDriftDetected`\n\nRule:\n- do not rely on markdown sidecar state; notesMarkdown is the durable recap and `implementation_plan.md` is the human artifact",
           "requireConfirmation": {
             "or": [
               { "var": "prStrategy", "equals": "MultiPR" },
@@ -239,7 +239,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Perform final integration verification.\n\nRequired:\n- verify acceptance criteria\n- map invariants to concrete proof (tests, build results, explicit reasoning)\n- run whole-task validation commands\n- identify any invariant violations or regressions\n- confirm the implemented result still aligns with the user's coding philosophy, naming any tensions explicitly\n- review cumulative drift across all slices, not just the current one\n- check whether repeated small compromises added up to a larger pattern problem\n\nSet context variables:\n- `integrationVerificationPassed`\n- `integrationVerificationFailed`\n- `integrationVerificationFindings`\n- `regressionDetected`\n- `invariantViolations`\n- `crossSliceDriftDetected`",
+      "prompt": "Perform final integration verification.\n\nRequired:\n- verify acceptance criteria\n- map invariants to concrete proof (tests, build results, explicit reasoning)\n- run whole-task validation commands\n- identify any invariant violations or regressions\n- confirm the implemented result still aligns with the user's coding philosophy, naming any tensions explicitly\n- review cumulative drift across all slices, not just the current one\n- check whether repeated small compromises added up to a larger pattern problem\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `integrationVerificationPassed`\n- `integrationVerificationFailed`\n- `integrationVerificationFindings`\n- `regressionDetected`\n- `invariantViolations`\n- `crossSliceDriftDetected`",
       "requireConfirmation": {
         "or": [
           { "var": "integrationVerificationFailed", "equals": true },

package/workflows/coding-task-workflow-with-loops.json CHANGED Viewed

@@ -16,7 +16,7 @@
         "fun createFile(filename) = 'Use edit_file to create/update {filename}. NEVER output full content in chat—only summarize. If fails, request user help & log command.'",
         "fun applyUserRules() = 'Apply & reference user-defined rules, patterns & preferences. Document alignment in Decision Log. Explain rule influence in decisions.'",
         "fun matchPatterns() = 'Use codebase_search/grep to find similar patterns. Reference Decision Log patterns. Match target area unless user rules override.'",
-        "fun addResumptionJson(phase) = 'Update CONTEXT.md resumption section with: 1) workflow_get instructions (id: coding-task-workflow-with-loops, mode: preview), 2) workflow_next JSON with workflowId, completedSteps up to {phase}, all context variables.'",
+        "fun addResumptionJson(phase) = 'Update CONTEXT.md with: 1) workflow_get (id: coding-task-workflow-with-loops, mode: preview), 2) workflow_next JSON with workflowId, completedSteps up to {phase}, all continue_workflow context keys.'",
         "fun gitCommit(type, msg) = 'If git available: commit with {type}: {msg}. If unavailable: log in CONTEXT.md with timestamp.'",
         "fun verifyImplementation() = '1) Test coverage >80%, 2) Run full test suite, 3) Self-review. Max 2 attempts before failure protocol.'",
         "fun checkAutomation(action) = 'High: auto-{action} if confidence >8. Medium: request confirmation. Low: extra confirmations.'",
@@ -57,7 +57,7 @@
                 "Consider both technical complexity and business risk",
                 "When in doubt, err on the side of more thorough analysis (higher complexity)",
                 "Always allow human override of your classification",
-                "Set context variables that will be used for conditional step execution and automation",
+                "Set these keys in the next `continue_workflow` call's `context` object that will be used for conditional step execution and automation",
                 "Automation levels: High=auto-approve confidence >8, Medium=standard, Low=extra confirmations"
             ],
             "requireConfirmation": true

package/workflows/document-creation-workflow.json CHANGED Viewed

@@ -35,7 +35,7 @@
       "agentRole": "You are a documentation strategy specialist with expertise in assessing documentation complexity and risk. Your role is to accurately classify documentation needs based on technical depth, stakeholder impact, and integration requirements.",
       "guidance": [
         "Consider both content complexity and organizational impact",
-        "Set context variables that drive conditional workflow execution",
+        "Set these keys in the next `continue_workflow` call's `context` object that drive conditional workflow execution",
         "When uncertain, err toward higher complexity for better quality",
         "Automation levels: High=auto-approve confidence >8, Medium=standard, Low=extra confirmations"
       ],

package/workflows/exploration-workflow.json CHANGED Viewed

@@ -14,7 +14,7 @@
     "preconditions": [
         "User has a clear task, problem, or question to explore",
         "User can provide initial context, constraints, or requirements",
-        "Agent can maintain context variables throughout the workflow"
+        "Agent can maintain `continue_workflow` context keys throughout the workflow"
     ],
     "metaGuidance": [
         "FUNCTION DEFINITIONS: fun trackEvidence(source, grade) = 'Add to context.evidenceLog[] with {source, grade, timestamp}. Grade: High (peer-reviewed/official), Medium (expert/established), Low (anecdotal/emerging)'",
@@ -45,7 +45,7 @@
                 "Consider both domain complexity and option space size",
                 "When in doubt, err on the side of more thorough analysis (higher complexity)",
                 "Always allow human override of classification",
-                "Set context variables for conditional step execution and automation",
+                "Set these keys in the next `continue_workflow` call's `context` object for conditional step execution and automation",
                 "Automation levels: High=auto-approve confidence >8, Medium=standard, Low=extra confirmations"
             ],
             "requireConfirmation": true
@@ -72,7 +72,7 @@
                 "Some tasks may span domains - choose primary domain",
                 "This classification affects tool selection and evaluation criteria",
                 "Document reasoning for domain choice",
-                "Set domain-specific context variables for later steps"
+                "Set domain-specific keys in the next `continue_workflow` call's `context` object for later steps"
             ],
             "requireConfirmation": false
         },

package/workflows/mr-review-workflow.agentic.v2.json CHANGED Viewed

@@ -22,8 +22,8 @@
   ],
   "metaGuidance": [
     "DEFAULT BEHAVIOR: self-execute with tools. Only ask for missing external artifacts, permissions, or business context you cannot resolve yourself.",
-    "V2 DURABILITY: use output.notesMarkdown and explicit context variables as durable workflow state. Do NOT rely on the live review document as required workflow memory.",
-    "ARTIFACT STRATEGY: `reviewDocPath` is a human-facing artifact only. Keep it updated for readability, but keep execution truth in notes/context variables.",
+    "V2 DURABILITY: use output.notesMarkdown and explicit `continue_workflow` context keys as durable workflow state. Do NOT rely on the live review document as required workflow memory.",
+    "ARTIFACT STRATEGY: `reviewDocPath` is a human-facing artifact only. Keep it updated for readability, but keep execution truth in notes/`continue_workflow` context.",
     "MAIN AGENT OWNS REVIEW: the main agent owns truth, synthesis, severity calibration, final recommendation, and document finalization.",
     "SUBAGENT MODEL: use the WorkRail Executor only. Do not refer to Builder, Researcher, or other named subagent identities.",
     "PARALLELISM: parallelize independent cognition; serialize synthesis, canonical review findings, recommendation decisions, and final document writes.",
@@ -39,7 +39,7 @@
     {
       "id": "phase-0-triage-and-mode",
       "title": "Phase 0: Triage (MR Context • Risk • Mode)",
-      "prompt": "Understand the MR and choose the right rigor.\n\nCapture:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched` (true/false)\n- `reviewMode`: QUICK / STANDARD / THOROUGH\n- `riskLevel`: Low / Medium / High\n- `maxParallelism`: 0 / 3 / 5\n\nDecision guidance:\n- QUICK: very small, isolated, low-risk changes with little ambiguity\n- STANDARD: typical feature or bug-fix reviews with moderate ambiguity or moderate risk\n- THOROUGH: critical surfaces, architectural novelty, high risk, broad change sets, or strong need for independent reviewer perspectives\n\nAlso choose `reviewDocPath` for the human-facing live artifact. Default suggestion: `mr-review.md` at the project root.\n\nSet context variables:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched`\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `reviewDocPath`\n\nAsk for confirmation only if the selected mode materially changes expectations or if the diff/source context is still missing.",
+      "prompt": "Understand the MR and choose the right rigor.\n\nCapture:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched` (true/false)\n- `reviewMode`: QUICK / STANDARD / THOROUGH\n- `riskLevel`: Low / Medium / High\n- `maxParallelism`: 0 / 3 / 5\n\nDecision guidance:\n- QUICK: very small, isolated, low-risk changes with little ambiguity\n- STANDARD: typical feature or bug-fix reviews with moderate ambiguity or moderate risk\n- THOROUGH: critical surfaces, architectural novelty, high risk, broad change sets, or strong need for independent reviewer perspectives\n\nAlso choose `reviewDocPath` for the human-facing live artifact. Default suggestion: `mr-review.md` at the project root.\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched`\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `reviewDocPath`\n\nAsk for confirmation only if the selected mode materially changes expectations or if the diff/source context is still missing.",
       "requireConfirmation": true
     },
     {
@@ -63,11 +63,11 @@
             { "kind": "ref", "refId": "wr.refs.notes_first_durability" }
           ],
           "Do the main context work yourself using tools.",
-          "Keep `reviewDocPath` updated for human readability, but keep execution truth in notes/context variables."
+          "Keep `reviewDocPath` updated for human readability, but keep execution truth in notes/`continue_workflow` context."
         ],
         "procedure": [
           "Produce a concise MR summary and intended behavior change, changed files overview, module or subsystem neighborhood, bounded call graph / public contracts / impacted consumers where relevant, repo patterns that matter for this review, and explicit unknowns / likely blind spots.",
-          "Set context variables: `contextSummary`, `candidateFiles`, `moduleRoots`, `contextUnknownCount`, `coverageGapCount`, `authorIntentUnclear`, `retriageNeeded`.",
+          "Set these keys in the next `continue_workflow` call's `context` object: `contextSummary`, `candidateFiles`, `moduleRoots`, `contextUnknownCount`, `coverageGapCount`, `authorIntentUnclear`, `retriageNeeded`.",
           "Compute `contextUnknownCount` as unresolved technical unknowns that materially affect review quality.",
           "Compute `coverageGapCount` as likely review angles or code areas still insufficiently understood.",
           "Set `retriageNeeded = true` if the real risk or surface area is larger than Phase 0 suggested.",
@@ -88,7 +88,7 @@
         "var": "retriageNeeded",
         "equals": true
       },
-      "prompt": "Reassess the review mode now that the real code context is known.\n\nReview:\n- `contextUnknownCount`\n- `coverageGapCount`\n- actual systems/components involved\n- whether `criticalSurfaceTouched` is still accurate\n- whether runtime or production simulation now looks necessary\n\nDo:\n- confirm or adjust `reviewMode`\n- confirm or adjust `riskLevel`\n- confirm or adjust `maxParallelism`\n- set `needsSimulation` to true or false\n- set `retriageChanged`\n\nEscalation rules:\n- QUICK may escalate to STANDARD if `criticalSurfaceTouched = true` or `contextUnknownCount > 0`\n- STANDARD may escalate to THOROUGH if `criticalSurfaceTouched = true` and risk is High, or if multiple unresolved context gaps remain\n\nSet context variables:\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `needsSimulation`\n- `retriageChanged`",
+      "prompt": "Reassess the review mode now that the real code context is known.\n\nReview:\n- `contextUnknownCount`\n- `coverageGapCount`\n- actual systems/components involved\n- whether `criticalSurfaceTouched` is still accurate\n- whether runtime or production simulation now looks necessary\n\nDo:\n- confirm or adjust `reviewMode`\n- confirm or adjust `riskLevel`\n- confirm or adjust `maxParallelism`\n- set `needsSimulation` to true or false\n- set `retriageChanged`\n\nEscalation rules:\n- QUICK may escalate to STANDARD if `criticalSurfaceTouched = true` or `contextUnknownCount > 0`\n- STANDARD may escalate to THOROUGH if `criticalSurfaceTouched = true` and risk is High, or if multiple unresolved context gaps remain\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `needsSimulation`\n- `retriageChanged`",
       "requireConfirmation": {
         "or": [
           { "var": "retriageChanged", "equals": true },
@@ -99,7 +99,7 @@
     {
       "id": "phase-2-fact-packet-and-family-selection",
       "title": "Phase 2: Freeze Fact Packet and Select Reviewer Families",
-      "prompt": "Freeze the shared factual basis that all downstream reviewer families must use, then choose the reviewer families from that same phase.\n\nCreate a neutral `reviewFactPacket` containing:\n- MR purpose and expected behavior change\n- changed files and module roots\n- key contracts, invariants, and affected consumers\n- call graph highlights or execution touchpoints\n- relevant repo patterns and exemplars\n- tests/docs expectations\n- explicit open unknowns\n\nInitialize `coverageLedger` with these domains, each marked as `checked`, `uncertain`, `not_applicable`, `contradicted`, or `needs_followup`:\n- correctness_logic\n- contracts_invariants\n- patterns_architecture\n- runtime_production_risk\n- tests_docs_rollout\n- security_performance\n\nThen perform a preliminary review from the shared fact packet and choose reviewer families.\n\nReviewer family options:\n- `correctness_invariants`\n- `patterns_architecture`\n- `runtime_production_risk`\n- `test_docs_rollout`\n- `false_positive_skeptic`\n- `missed_issue_hunter`\n\nSelection guidance:\n- QUICK: no family bundle by default; add `false_positive_skeptic` only if a supposedly easy review still feels risky or ambiguous\n- STANDARD: run 3 families by default\n- THOROUGH: run 5 families by default\n- always include `correctness_invariants` unless clearly not applicable\n- always include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable\n- include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`\n- include `missed_issue_hunter` in THOROUGH mode\n- include `false_positive_skeptic` whenever Major/Critical findings are likely, the change is controversial, or severity inflation risk is non-trivial\n\nAnti-anchoring rule:\n- reviewer families must treat `reviewFactPacket` as primary truth\n- `recommendationHypothesis` is optional secondary context only; it must not become the frame every family simply validates\n\nCoverage ledger rules:\n- use `contradicted` when evidence materially conflicts across reviewer families and the disagreement is unresolved\n- use `needs_followup` when the domain is relevant and additional targeted work is still required\n- use `uncertain` only for bounded ambiguity where no direct contradiction exists yet\n- compute `coverageUncertainCount` as the count of coverage domains not yet safely closed: `uncertain` + `contradicted` + `needs_followup`\n\nDefault reviewer-bundle rule:\n- QUICK: `needsReviewerBundle = false` unless a trigger or risk signal clearly justifies it\n- STANDARD / THOROUGH: `needsReviewerBundle = true` by default unless the review is materially simpler than expected\n\nSet context variables:\n- `reviewFactPacket`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `preliminaryFindings`\n- `recommendationHypothesis`\n- `reviewFamiliesSelected`\n- `needsReviewerBundle`",
+      "prompt": "Freeze the shared factual basis that all downstream reviewer families must use, then choose the reviewer families from that same phase.\n\nCreate a neutral `reviewFactPacket` containing:\n- MR purpose and expected behavior change\n- changed files and module roots\n- key contracts, invariants, and affected consumers\n- call graph highlights or execution touchpoints\n- relevant repo patterns and exemplars\n- tests/docs expectations\n- explicit open unknowns\n\nInitialize `coverageLedger` with these domains, each marked as `checked`, `uncertain`, `not_applicable`, `contradicted`, or `needs_followup`:\n- correctness_logic\n- contracts_invariants\n- patterns_architecture\n- runtime_production_risk\n- tests_docs_rollout\n- security_performance\n\nThen perform a preliminary review from the shared fact packet and choose reviewer families.\n\nReviewer family options:\n- `correctness_invariants`\n- `patterns_architecture`\n- `runtime_production_risk`\n- `test_docs_rollout`\n- `false_positive_skeptic`\n- `missed_issue_hunter`\n\nSelection guidance:\n- QUICK: no family bundle by default; add `false_positive_skeptic` only if a supposedly easy review still feels risky or ambiguous\n- STANDARD: run 3 families by default\n- THOROUGH: run 5 families by default\n- always include `correctness_invariants` unless clearly not applicable\n- always include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable\n- include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`\n- include `missed_issue_hunter` in THOROUGH mode\n- include `false_positive_skeptic` whenever Major/Critical findings are likely, the change is controversial, or severity inflation risk is non-trivial\n\nAnti-anchoring rule:\n- reviewer families must treat `reviewFactPacket` as primary truth\n- `recommendationHypothesis` is optional secondary context only; it must not become the frame every family simply validates\n\nCoverage ledger rules:\n- use `contradicted` when evidence materially conflicts across reviewer families and the disagreement is unresolved\n- use `needs_followup` when the domain is relevant and additional targeted work is still required\n- use `uncertain` only for bounded ambiguity where no direct contradiction exists yet\n- compute `coverageUncertainCount` as the count of coverage domains not yet safely closed: `uncertain` + `contradicted` + `needs_followup`\n\nDefault reviewer-bundle rule:\n- QUICK: `needsReviewerBundle = false` unless a trigger or risk signal clearly justifies it\n- STANDARD / THOROUGH: `needsReviewerBundle = true` by default unless the review is materially simpler than expected\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `reviewFactPacket`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `preliminaryFindings`\n- `recommendationHypothesis`\n- `reviewFamiliesSelected`\n- `needsReviewerBundle`",
       "requireConfirmation": false
     },
     {
@@ -126,14 +126,14 @@
           "Each reviewer family must return: key findings, severity estimates, confidence level, top risks, recommendation, and what others may have missed.",
           "Family missions: `correctness_invariants` = logic, correctness, API and invariant risks; `patterns_architecture` = pattern fit, design consistency, architectural concerns; `runtime_production_risk` = runtime behavior, production impact, performance/state-flow risk; `test_docs_rollout` = test adequacy, docs, migration, rollout, affected consumers; `false_positive_skeptic` = challenge likely overreaches, weak evidence, or severity inflation; `missed_issue_hunter` = search for an important category of issue the others may miss.",
           "Mode-adaptive parallelism: STANDARD = spawn THREE WorkRail Executors SIMULTANEOUSLY for the selected families; THOROUGH = spawn FIVE WorkRail Executors SIMULTANEOUSLY for the selected families.",
-          "Set context variables: `familyFindingsSummary`, `familyRecommendationSpread`, `contradictionCount`, `blindSpotCount`, `falsePositiveRiskCount`, `needsSimulation`.",
+          "Set these keys in the next `continue_workflow` call's `context` object: `familyFindingsSummary`, `familyRecommendationSpread`, `contradictionCount`, `blindSpotCount`, `falsePositiveRiskCount`, `needsSimulation`.",
           "Compute `contradictionCount` as material disagreements across reviewer families about issue validity, severity, or final recommendation.",
           "Increase `blindSpotCount` if the missed-issue hunter or any other family identifies uncovered review space.",
           "Increase `falsePositiveRiskCount` when the skeptic materially weakens one or more high-severity findings."
         ],
         "verify": [
           "The same fact packet was used as primary truth across reviewer families.",
-          "Contradictions, blind spots, and false-positive risks are all reflected structurally in context variables.",
+          "Contradictions, blind spots, and false-positive risks are all reflected structurally in the `continue_workflow` context object.",
           "Parallel reviewer outputs are not treated as self-finalizing; the main agent still owns synthesis."
         ]
       },
@@ -182,7 +182,7 @@
         {
           "id": "phase-4b-canonical-synthesis",
           "title": "Canonical Synthesis and Coverage Update",
-      "prompt": "Synthesize all reviewer-family outputs and any targeted follow-up into one canonical review state.\n\nSynthesis decision table:\n- if 2+ reviewer families flag the same serious issue with the same severity, treat it as validated\n- if the same issue is flagged with different severities, default to the higher severity unless the lower-severity position includes specific counter-evidence\n- if one family flags an issue and others are silent, investigate it but do not automatically block unless it is clearly critical or security-sensitive\n- if one family says false positive and another says valid issue, require explicit main-agent adjudication in notes before finalization\n- if recommendation spread shows material disagreement, findings override recommendation until reconciled\n- if simulation reveals a new production risk, add a new finding and re-evaluate recommendation confidence\n\nCoverage ledger rules:\n- move a domain from `uncertain` to `checked` only when the evidence is materially adequate\n- keep a domain `uncertain` if disagreement or missing evidence still materially affects recommendation quality\n- mark `not_applicable` only when the MR genuinely does not engage that dimension\n- clear `contradicted` only when the contradiction is explicitly resolved by evidence or adjudication\n- clear `needs_followup` only when the required targeted follow-up has actually been completed or the domain is explicitly downgraded as non-material\n\nRecommendation confidence rules:\n- set `recommendationConfidenceBand = High` only if no unresolved material contradictions remain, no important coverage domains remain uncertain, false-positive risk is not material, and consensus is strong enough for the current mode\n- set `recommendationConfidenceBand = Medium` when one bounded uncertainty remains but the recommendation is still directionally justified\n- set `recommendationConfidenceBand = Low` when multiple viable interpretations remain, major contradictions are unresolved, or important coverage gaps still weaken the recommendation\n\nSet context variables:\n- `reviewFindings`\n- `criticalFindingsCount`\n- `majorFindingsCount`\n- `minorFindingsCount`\n- `nitFindingsCount`\n- `recommendation`\n- `recommendationConfidenceBand`\n- `recommendationDriftDetected`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `docCompletenessConcernCount`\n\nUpdate `reviewDocPath` so the human artifact matches the canonical review state.",
+      "prompt": "Synthesize all reviewer-family outputs and any targeted follow-up into one canonical review state.\n\nSynthesis decision table:\n- if 2+ reviewer families flag the same serious issue with the same severity, treat it as validated\n- if the same issue is flagged with different severities, default to the higher severity unless the lower-severity position includes specific counter-evidence\n- if one family flags an issue and others are silent, investigate it but do not automatically block unless it is clearly critical or security-sensitive\n- if one family says false positive and another says valid issue, require explicit main-agent adjudication in notes before finalization\n- if recommendation spread shows material disagreement, findings override recommendation until reconciled\n- if simulation reveals a new production risk, add a new finding and re-evaluate recommendation confidence\n\nCoverage ledger rules:\n- move a domain from `uncertain` to `checked` only when the evidence is materially adequate\n- keep a domain `uncertain` if disagreement or missing evidence still materially affects recommendation quality\n- mark `not_applicable` only when the MR genuinely does not engage that dimension\n- clear `contradicted` only when the contradiction is explicitly resolved by evidence or adjudication\n- clear `needs_followup` only when the required targeted follow-up has actually been completed or the domain is explicitly downgraded as non-material\n\nRecommendation confidence rules:\n- set `recommendationConfidenceBand = High` only if no unresolved material contradictions remain, no important coverage domains remain uncertain, false-positive risk is not material, and consensus is strong enough for the current mode\n- set `recommendationConfidenceBand = Medium` when one bounded uncertainty remains but the recommendation is still directionally justified\n- set `recommendationConfidenceBand = Low` when multiple viable interpretations remain, major contradictions are unresolved, or important coverage gaps still weaken the recommendation\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `reviewFindings`\n- `criticalFindingsCount`\n- `majorFindingsCount`\n- `minorFindingsCount`\n- `nitFindingsCount`\n- `recommendation`\n- `recommendationConfidenceBand`\n- `recommendationDriftDetected`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `docCompletenessConcernCount`\n\nUpdate `reviewDocPath` so the human artifact matches the canonical review state.",
           "requireConfirmation": false
         },
         {
@@ -213,7 +213,7 @@
           "Run final validation if any of these are true: `criticalSurfaceTouched = true`, `needsSimulation = true`, `falsePositiveRiskCount > 0`, `coverageUncertainCount > 0`, `docCompletenessConcernCount > 0`, or `recommendationConfidenceBand != High`.",
           "Mode-adaptive validation: QUICK = self-validate and optionally spawn ONE WorkRail Executor running `routine-hypothesis-challenge` if a serious uncertainty remains; STANDARD = if validation is required and delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge` and either `routine-execution-simulation` or `routine-plan-analysis`; THOROUGH = if validation is required and delegation is available, spawn THREE WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation` when needed, and `routine-plan-analysis`.",
           "Compute `docCompletenessConcernCount` by counting one concern for each material packaging gap: missing rationale for any Critical or Major finding, missing ready-to-post MR comment for any Critical or Major finding, recommendation mismatch with canonical findings, still-uncertain / contradicted / needs-followup coverage domains not summarized clearly, or any missing required final section needed for actionability.",
-          "Set context variables: `validatorConsensusLevel`, `validationSummary`, `recommendationConfidenceBand`, `docCompletenessConcernCount`."
+          "Set these keys in the next `continue_workflow` call's `context` object: `validatorConsensusLevel`, `validationSummary`, `recommendationConfidenceBand`, `docCompletenessConcernCount`."
         ],
         "verify": [
           "If 2+ validators still raise serious concerns, confidence is downgraded and synthesis is reopened.",