npm - @exaudeus/workrail - Versions diffs - 3.3.0 → 3.5.0 - Mend

@exaudeus/workrail 3.3.0 → 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

package/dist/application/services/compiler/binding-registry.d.ts +3 -0
package/dist/application/services/compiler/binding-registry.js +71 -0
package/dist/application/services/compiler/resolve-bindings.d.ts +18 -0
package/dist/application/services/compiler/resolve-bindings.js +162 -0
package/dist/application/services/compiler/sentinel-scan.d.ts +9 -0
package/dist/application/services/compiler/sentinel-scan.js +37 -0
package/dist/application/services/validation-engine.js +104 -0
package/dist/application/services/workflow-compiler.d.ts +10 -2
package/dist/application/services/workflow-compiler.js +25 -6
package/dist/application/services/workflow-validation-pipeline.js +8 -1
package/dist/cli.js +2 -2
package/dist/engine/engine-factory.js +1 -1
package/dist/index.d.ts +2 -1
package/dist/index.js +4 -2
package/dist/manifest.json +149 -101
package/dist/mcp/handler-factory.d.ts +1 -1
package/dist/mcp/handler-factory.js +2 -2
package/dist/mcp/handlers/v2-checkpoint.js +5 -5
package/dist/mcp/handlers/v2-error-mapping.js +4 -4
package/dist/mcp/handlers/v2-execution/continue-advance.js +2 -2
package/dist/mcp/handlers/v2-execution/continue-rehydrate.d.ts +1 -0
package/dist/mcp/handlers/v2-execution/continue-rehydrate.js +76 -60
package/dist/mcp/handlers/v2-execution/index.js +86 -44
package/dist/mcp/handlers/v2-execution-helpers.js +1 -1
package/dist/mcp/handlers/v2-resume.js +10 -5
package/dist/mcp/handlers/v2-token-ops.d.ts +1 -1
package/dist/mcp/handlers/v2-token-ops.js +5 -5
package/dist/mcp/handlers/v2-workspace-resolution.d.ts +1 -0
package/dist/mcp/handlers/v2-workspace-resolution.js +12 -0
package/dist/mcp/index.d.ts +4 -1
package/dist/mcp/index.js +6 -2
package/dist/mcp/output-schemas.d.ts +148 -8
package/dist/mcp/output-schemas.js +22 -4
package/dist/mcp/server.d.ts +6 -4
package/dist/mcp/server.js +2 -57
package/dist/mcp/tool-descriptions.js +9 -158
package/dist/mcp/transports/http-entry.js +6 -25
package/dist/mcp/transports/shutdown-hooks.d.ts +5 -0
package/dist/mcp/transports/shutdown-hooks.js +38 -0
package/dist/mcp/transports/stdio-entry.js +6 -28
package/dist/mcp/v2/tool-registry.js +2 -1
package/dist/mcp/v2/tools.d.ts +28 -11
package/dist/mcp/v2/tools.js +28 -4
package/dist/mcp/v2-response-formatter.js +28 -1
package/dist/mcp/validation/suggestion-generator.d.ts +1 -1
package/dist/mcp/validation/suggestion-generator.js +13 -3
package/dist/mcp/workflow-protocol-contracts.d.ts +31 -0
package/dist/mcp/workflow-protocol-contracts.js +207 -0
package/dist/mcp-server.d.ts +3 -1
package/dist/mcp-server.js +6 -2
package/dist/types/workflow-definition.d.ts +7 -0
package/dist/types/workflow-definition.js +1 -0
package/dist/v2/durable-core/domain/binding-drift.d.ts +8 -0
package/dist/v2/durable-core/domain/binding-drift.js +29 -0
package/dist/v2/durable-core/domain/reason-model.js +2 -2
package/dist/v2/durable-core/schemas/compiled-workflow/index.d.ts +12 -0
package/dist/v2/durable-core/schemas/compiled-workflow/index.js +2 -0
package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +56 -56
package/dist/v2/durable-core/schemas/session/events.d.ts +16 -16
package/dist/v2/durable-core/schemas/session/gaps.d.ts +6 -6
package/dist/v2/projections/resume-ranking.d.ts +1 -0
package/dist/v2/projections/resume-ranking.js +1 -0
package/dist/v2/read-only/v1-to-v2-shim.js +27 -10
package/dist/v2/usecases/resume-session.d.ts +5 -1
package/dist/v2/usecases/resume-session.js +4 -1
package/package.json +1 -1
package/spec/authoring-spec.json +1373 -0
package/spec/workflow.schema.json +132 -2
package/workflows/coding-task-workflow-agentic.json +15 -15
package/workflows/coding-task-workflow-agentic.lean.v2.json +10 -10
package/workflows/coding-task-workflow-agentic.v2.json +12 -12
package/workflows/coding-task-workflow-with-loops.json +2 -2
package/workflows/cross-platform-code-conversion.v2.json +199 -0
package/workflows/document-creation-workflow.json +1 -1
package/workflows/exploration-workflow.json +3 -3
package/workflows/mr-review-workflow.agentic.v2.json +11 -11
package/workflows/routines/parallel-work-partitioning.json +43 -0
package/workflows/workflow-for-workflows.v2.json +186 -0

package/workflows/coding-task-workflow-agentic.v2.json CHANGED Viewed

@@ -41,7 +41,7 @@
     {
       "id": "phase-0-triage-and-mode",
       "title": "Phase 0: Triage (Complexity • Risk • PR Strategy)",
-      "prompt": "Analyze the task and choose the right rigor.\n\nClassify:\n- `taskComplexity`: Small / Medium / Large\n- `riskLevel`: Low / Medium / High\n- `rigorMode`: QUICK / STANDARD / THOROUGH\n- `automationLevel`: High / Medium / Low\n- `prStrategy`: SinglePR / MultiPR\n- `maxParallelism`: 0 / 3 / 4\n\nDecision guidance:\n- QUICK: small, low-risk, clear path, little ambiguity\n- STANDARD: medium scope or moderate risk\n- THOROUGH: large scope, architectural uncertainty, or high-risk change\n\nParallelism guidance:\n- QUICK: no delegation by default\n- STANDARD: few delegation moments, but allow multiple parallel executors at each moment\n- THOROUGH: same pattern, but with one extra delegation moment and broader parallel validation\n\nAlso capture `userRules` from the active session instructions and explicit philosophy. Keep them as a focused list of concrete, actionable rules.\n\nSet context variables: `taskComplexity`, `riskLevel`, `rigorMode`, `automationLevel`, `prStrategy`, `maxParallelism`, `userRules`.\n\nAsk the user to confirm only if the rigor or PR strategy materially affects delivery expectations.",
+      "prompt": "Analyze the task and choose the right rigor.\n\nClassify:\n- `taskComplexity`: Small / Medium / Large\n- `riskLevel`: Low / Medium / High\n- `rigorMode`: QUICK / STANDARD / THOROUGH\n- `automationLevel`: High / Medium / Low\n- `prStrategy`: SinglePR / MultiPR\n- `maxParallelism`: 0 / 3 / 4\n\nDecision guidance:\n- QUICK: small, low-risk, clear path, little ambiguity\n- STANDARD: medium scope or moderate risk\n- THOROUGH: large scope, architectural uncertainty, or high-risk change\n\nParallelism guidance:\n- QUICK: no delegation by default\n- STANDARD: few delegation moments, but allow multiple parallel executors at each moment\n- THOROUGH: same pattern, but with one extra delegation moment and broader parallel validation\n\nAlso capture `userRules` from the active session instructions and explicit philosophy. Keep them as a focused list of concrete, actionable rules.\n\nSet these keys in the next `continue_workflow` call's `context` object: `taskComplexity`, `riskLevel`, `rigorMode`, `automationLevel`, `prStrategy`, `maxParallelism`, `userRules`.\n\nAsk the user to confirm only if the rigor or PR strategy materially affects delivery expectations.",
       "requireConfirmation": true
     },
     {
@@ -62,7 +62,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Build the minimum complete understanding needed to design correctly.\n\nDo the main context gathering yourself using tools. Read independent files in parallel when possible.\n\nDeliverable:\n- key entry points and call chain sketch\n- relevant files/modules/functions\n- existing repo patterns with concrete file references\n- testing strategy already present in the repo\n- risks and unknowns\n- explicit invariants and non-goals\n\nSet context variables:\n- `contextSummary`\n- `candidateFiles`\n- `invariants`\n- `nonGoals`\n- `openQuestions`\n- `contextUnknownCount`\n- `contextAuditNeeded`\n\nRules:\n- answer your own questions with tools whenever possible\n- only keep true human-decision questions in `openQuestions`\n- keep `openQuestions` bounded to the minimum necessary\n- set `contextUnknownCount` to the number of unresolved technical unknowns that still matter\n- set `contextAuditNeeded` to true if understanding still feels incomplete or the call chain is still too fuzzy\n\nMode-adaptive audit:\n- QUICK: no delegation; self-check only\n- STANDARD: if `contextAuditNeeded = true` or risk is High and delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step\n- THOROUGH: if delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step",
+      "prompt": "Build the minimum complete understanding needed to design correctly.\n\nDo the main context gathering yourself using tools. Read independent files in parallel when possible.\n\nDeliverable:\n- key entry points and call chain sketch\n- relevant files/modules/functions\n- existing repo patterns with concrete file references\n- testing strategy already present in the repo\n- risks and unknowns\n- explicit invariants and non-goals\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `contextSummary`\n- `candidateFiles`\n- `invariants`\n- `nonGoals`\n- `openQuestions`\n- `contextUnknownCount`\n- `contextAuditNeeded`\n\nRules:\n- answer your own questions with tools whenever possible\n- only keep true human-decision questions in `openQuestions`\n- keep `openQuestions` bounded to the minimum necessary\n- set `contextUnknownCount` to the number of unresolved technical unknowns that still matter\n- set `contextAuditNeeded` to true if understanding still feels incomplete or the call chain is still too fuzzy\n\nMode-adaptive audit:\n- QUICK: no delegation; self-check only\n- STANDARD: if `contextAuditNeeded = true` or risk is High and delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step\n- THOROUGH: if delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-context-gathering` with focus=COMPLETENESS and focus=DEPTH, then synthesize both outputs before finishing this step",
       "requireConfirmation": false
     },
     {
@@ -72,7 +72,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Reassess the initial triage now that context is real instead of hypothetical.\n\nReview:\n- `contextUnknownCount`\n- number of systems/components actually involved\n- number of critical invariants discovered\n- whether the task now looks broader or riskier than Phase 0 suggested\n\nDo:\n- confirm or adjust `taskComplexity`\n- confirm or adjust `riskLevel`\n- confirm or adjust `rigorMode`\n- confirm or adjust `maxParallelism`\n\nRules:\n- upgrade rigor if the real architecture surface or uncertainty is larger than expected\n- downgrade only if the task is genuinely simpler than it first appeared\n- if you change the mode, explain why using concrete evidence from the context gathering step\n\nSet context variables:\n- `taskComplexity`\n- `riskLevel`\n- `rigorMode`\n- `maxParallelism`\n- `retriageChanged` (true/false)",
+      "prompt": "Reassess the initial triage now that context is real instead of hypothetical.\n\nReview:\n- `contextUnknownCount`\n- number of systems/components actually involved\n- number of critical invariants discovered\n- whether the task now looks broader or riskier than Phase 0 suggested\n\nDo:\n- confirm or adjust `taskComplexity`\n- confirm or adjust `riskLevel`\n- confirm or adjust `rigorMode`\n- confirm or adjust `maxParallelism`\n\nRules:\n- upgrade rigor if the real architecture surface or uncertainty is larger than expected\n- downgrade only if the task is genuinely simpler than it first appeared\n- if you change the mode, explain why using concrete evidence from the context gathering step\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `taskComplexity`\n- `riskLevel`\n- `rigorMode`\n- `maxParallelism`\n- `retriageChanged` (true/false)",
       "requireConfirmation": {
         "or": [
           { "var": "retriageChanged", "equals": true },
@@ -87,7 +87,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Make the architecture decision in one coherent phase instead of serializing every thinking mode into a separate step.\n\nPart A — Prepare a neutral fact packet:\n- problem statement\n- acceptance criteria\n- non-goals\n- invariants\n- constraints\n- `userRules`\n- relevant files / pattern examples\n- current risks and unknowns\n\nPart B — Generate candidate plans:\n- QUICK: self-generate at least 3 genuinely different approaches\n- STANDARD: if delegation is available, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, pragmatic)\n- THOROUGH: if delegation is available, spawn THREE or FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, architecture-first, rollback-safe)\n\nPart C — Diversity gate before commitment:\n- assign each candidate plan a short `candidatePlanFamily` label\n- check whether the candidates are materially different in shape, not just wording\n- if all candidates cluster on the same pattern family, generate at least one more plan from a deliberately different perspective before selecting\n- set `candidateDiversityAdequate = true|false`\n\nPart D — Compare candidate plans:\n- invariant fit\n- philosophy alignment (`userRules` as active lens)\n- risk profile\n- implementation shape\n- likely reviewability / PR shape\n\nPart E — Challenge the best one or two:\n- STANDARD: optionally challenge the leading candidate with ONE WorkRail Executor running `routine-hypothesis-challenge`\n- THOROUGH: challenge the top 1-2 candidate plans using ONE or TWO WorkRail Executors running `routine-hypothesis-challenge`\n\nPart F — Decide:\nSet context variables:\n- `approaches`\n- `alternativesConsideredCount`\n- `candidatePlanFamilies`\n- `candidateDiversityAdequate`\n- `hasRunnerUp`\n- `selectedApproach`\n- `runnerUpApproach`\n- `architectureRationale`\n- `keyRiskToMonitor`\n- `pivotTriggers`\n- `architectureConfidenceBand`\n\nRules:\n- the main agent owns the final decision\n- subagents generate candidate plans; they do not decide the winner\n- if the challenged leading candidate no longer looks best, switch deliberately rather than defending sunk cost",
+      "prompt": "Make the architecture decision in one coherent phase instead of serializing every thinking mode into a separate step.\n\nPart A — Prepare a neutral fact packet:\n- problem statement\n- acceptance criteria\n- non-goals\n- invariants\n- constraints\n- `userRules`\n- relevant files / pattern examples\n- current risks and unknowns\n\nPart B — Generate candidate plans:\n- QUICK: self-generate at least 3 genuinely different approaches\n- STANDARD: if delegation is available, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, pragmatic)\n- THOROUGH: if delegation is available, spawn THREE or FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-generation` with different perspectives (for example simplicity, maintainability, architecture-first, rollback-safe)\n\nPart C — Diversity gate before commitment:\n- assign each candidate plan a short `candidatePlanFamily` label\n- check whether the candidates are materially different in shape, not just wording\n- if all candidates cluster on the same pattern family, generate at least one more plan from a deliberately different perspective before selecting\n- set `candidateDiversityAdequate = true|false`\n\nPart D — Compare candidate plans:\n- invariant fit\n- philosophy alignment (`userRules` as active lens)\n- risk profile\n- implementation shape\n- likely reviewability / PR shape\n\nPart E — Challenge the best one or two:\n- STANDARD: optionally challenge the leading candidate with ONE WorkRail Executor running `routine-hypothesis-challenge`\n- THOROUGH: challenge the top 1-2 candidate plans using ONE or TWO WorkRail Executors running `routine-hypothesis-challenge`\n\nPart F — Decide:\nSet these keys in the next `continue_workflow` call's `context` object:\n- `approaches`\n- `alternativesConsideredCount`\n- `candidatePlanFamilies`\n- `candidateDiversityAdequate`\n- `hasRunnerUp`\n- `selectedApproach`\n- `runnerUpApproach`\n- `architectureRationale`\n- `keyRiskToMonitor`\n- `pivotTriggers`\n- `architectureConfidenceBand`\n\nRules:\n- the main agent owns the final decision\n- subagents generate candidate plans; they do not decide the winner\n- if the challenged leading candidate no longer looks best, switch deliberately rather than defending sunk cost",
       "requireConfirmation": {
         "or": [
           { "var": "automationLevel", "equals": "Low" },
@@ -103,7 +103,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Create or update the human-facing implementation artifact: `implementation_plan.md`.\n\nThis phase combines slicing, plan drafting, philosophy alignment, and test design.\n\nThe plan must include:\n1. Problem statement\n2. Acceptance criteria\n3. Non-goals\n4. Applied `userRules` and philosophy-driven constraints\n5. Invariants\n6. Selected approach + rationale + runner-up\n7. Vertical slices\n8. Work packages only when they improve execution or enable safe parallelism\n9. Test design\n10. Risk register\n11. PR packaging strategy\n12. Philosophy alignment per slice:\n   - [principle] → [satisfied / tension / violated + 1-line why]\n\nSet context variables:\n- `implementationPlan`\n- `slices`\n- `testDesign`\n- `estimatedPRCount`\n- `followUpTickets` (initialize if needed)\n- `unresolvedUnknownCount`\n- `planConfidenceBand`\n\nRules:\n- keep `implementation_plan.md` concrete enough for another engineer to implement without guessing\n- use work packages only when they create real clarity; do not over-fragment work\n- use the user's coding philosophy as the primary planning lens, and name tensions explicitly\n- set `unresolvedUnknownCount` to the number of still-open issues that would materially affect implementation quality\n- set `planConfidenceBand` to Low / Medium / High based on how ready the plan actually is",
+      "prompt": "Create or update the human-facing implementation artifact: `implementation_plan.md`.\n\nThis phase combines slicing, plan drafting, philosophy alignment, and test design.\n\nThe plan must include:\n1. Problem statement\n2. Acceptance criteria\n3. Non-goals\n4. Applied `userRules` and philosophy-driven constraints\n5. Invariants\n6. Selected approach + rationale + runner-up\n7. Vertical slices\n8. Work packages only when they improve execution or enable safe parallelism\n9. Test design\n10. Risk register\n11. PR packaging strategy\n12. Philosophy alignment per slice:\n   - [principle] → [satisfied / tension / violated + 1-line why]\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `implementationPlan`\n- `slices`\n- `testDesign`\n- `estimatedPRCount`\n- `followUpTickets` (initialize if needed)\n- `unresolvedUnknownCount`\n- `planConfidenceBand`\n\nRules:\n- keep `implementation_plan.md` concrete enough for another engineer to implement without guessing\n- use work packages only when they create real clarity; do not over-fragment work\n- use the user's coding philosophy as the primary planning lens, and name tensions explicitly\n- set `unresolvedUnknownCount` to the number of still-open issues that would materially affect implementation quality\n- set `planConfidenceBand` to Low / Medium / High based on how ready the plan actually is",
       "requireConfirmation": false
     },
     {
@@ -127,13 +127,13 @@
         {
           "id": "phase-4a-plan-audit",
           "title": "Plan Audit (Correctness + Philosophy + Regression)",
-          "prompt": "Audit the plan before implementation.\n\nAlways perform:\n- completeness / missing work\n- weak assumptions and risks\n- invariant coverage\n- slice boundary quality\n- philosophy alignment against `userRules`\n- regression check against `resolvedFindings` (if present)\n\nPhilosophy rules:\n- flag findings by principle name\n- Red / Orange findings go into `planFindings`\n- Yellow tensions are informational only and do NOT block loop exit\n- if no philosophy principles are configured, say so explicitly and continue\n\nRegression check:\n- if `resolvedFindings` is non-empty, verify previous resolutions still hold\n- if a previously resolved issue has reappeared, add a Critical regression finding\n\nMode-adaptive delegation:\n- QUICK: self-audit only\n- STANDARD: if delegation is available, spawn THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, and `routine-philosophy-alignment`; include `routine-execution-simulation` only when runtime or state-flow risk is material\n- THOROUGH: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, `routine-execution-simulation`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ auditors flag the same issue, treat it as high priority by default\n- if one auditor flags a concern no one else sees, investigate it but do not automatically block unless it is clearly severe\n- if outputs conflict, document the conflict explicitly and resolve it before finalizing `planFindings`\n- if philosophy review yields Red findings and no stronger conflicting evidence exists, they must remain blocking\n\nSet context variables:\n- `planFindings`\n- `planAmendments`\n- `planConfidence`\n- `cleanSlateDivergence`\n- `planFindingsCountBySeverity`\n- `philosophyFindingsCountBySeverity`\n- `auditConsensusLevel`\n\nRules:\n- use the main agent as synthesizer and final decision-maker\n- do not delegate sequentially when the audit routines are independent",
+          "prompt": "Audit the plan before implementation.\n\nAlways perform:\n- completeness / missing work\n- weak assumptions and risks\n- invariant coverage\n- slice boundary quality\n- philosophy alignment against `userRules`\n- regression check against `resolvedFindings` (if present)\n\nPhilosophy rules:\n- flag findings by principle name\n- Red / Orange findings go into `planFindings`\n- Yellow tensions are informational only and do NOT block loop exit\n- if no philosophy principles are configured, say so explicitly and continue\n\nRegression check:\n- if `resolvedFindings` is non-empty, verify previous resolutions still hold\n- if a previously resolved issue has reappeared, add a Critical regression finding\n\nMode-adaptive delegation:\n- QUICK: self-audit only\n- STANDARD: if delegation is available, spawn THREE WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, and `routine-philosophy-alignment`; include `routine-execution-simulation` only when runtime or state-flow risk is material\n- THOROUGH: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-plan-analysis`, `routine-hypothesis-challenge`, `routine-execution-simulation`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ auditors flag the same issue, treat it as high priority by default\n- if one auditor flags a concern no one else sees, investigate it but do not automatically block unless it is clearly severe\n- if outputs conflict, document the conflict explicitly and resolve it before finalizing `planFindings`\n- if philosophy review yields Red findings and no stronger conflicting evidence exists, they must remain blocking\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `planFindings`\n- `planAmendments`\n- `planConfidence`\n- `cleanSlateDivergence`\n- `planFindingsCountBySeverity`\n- `philosophyFindingsCountBySeverity`\n- `auditConsensusLevel`\n\nRules:\n- use the main agent as synthesizer and final decision-maker\n- do not delegate sequentially when the audit routines are independent",
           "requireConfirmation": false
         },
         {
           "id": "phase-4b-refocus",
           "title": "Refocus Plan and Track Resolved Findings",
-          "prompt": "Apply plan amendments and refocus.\n\nDo:\n- update `implementation_plan.md` to incorporate `planAmendments`\n- update `slices` if the plan shape changed\n- extract out-of-scope work into `followUpTickets`\n- maintain `resolvedFindings` as an array of { finding, resolution, iteration }\n- cap `resolvedFindings` at 10 entries, dropping oldest first\n\nSet context variables:\n- `resolvedFindings`\n- `followUpTickets`\n\nRule:\n- do not silently accept plan drift; if the audit changed the shape of the work, reflect it in the plan artifact immediately",
+          "prompt": "Apply plan amendments and refocus.\n\nDo:\n- update `implementation_plan.md` to incorporate `planAmendments`\n- update `slices` if the plan shape changed\n- extract out-of-scope work into `followUpTickets`\n- maintain `resolvedFindings` as an array of { finding, resolution, iteration }\n- cap `resolvedFindings` at 10 entries, dropping oldest first\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `resolvedFindings`\n- `followUpTickets`\n\nRule:\n- do not silently accept plan drift; if the audit changed the shape of the work, reflect it in the plan artifact immediately",
           "requireConfirmation": false
         },
         {
@@ -154,7 +154,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Verify that planning is complete enough to start implementation.\n\nRequired checks:\n- selected approach and rationale exist\n- runner-up exists\n- pivot triggers are concrete enough to act on\n- slices are defined with scope, verification, and boundaries\n- `implementation_plan.md` reflects the current intended work\n- no unresolved planning gaps remain that would block implementation\n- `alternativesConsideredCount` shows real exploration happened\n\nSet context variables:\n- `planningGaps`\n- `planningComplete`\n\nRule:\n- if any gap can be fixed immediately, fix it now and do not carry it forward as a gap\n- only stop for user input when a true decision is missing",
+      "prompt": "Verify that planning is complete enough to start implementation.\n\nRequired checks:\n- selected approach and rationale exist\n- runner-up exists\n- pivot triggers are concrete enough to act on\n- slices are defined with scope, verification, and boundaries\n- `implementation_plan.md` reflects the current intended work\n- no unresolved planning gaps remain that would block implementation\n- `alternativesConsideredCount` shows real exploration happened\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `planningGaps`\n- `planningComplete`\n\nRule:\n- if any gap can be fixed immediately, fix it now and do not carry it forward as a gap\n- only stop for user input when a true decision is missing",
       "requireConfirmation": {
         "or": [
           { "var": "automationLevel", "equals": "Low" },
@@ -192,7 +192,7 @@
         {
           "id": "phase-7a-slice-preflight",
           "title": "Slice Preflight",
-          "prompt": "Before implementing slice `{{currentSlice.name}}`, verify:\n- pivot triggers have not fired\n- the plan assumptions are still fresh enough\n- target files and symbols still match the plan\n- the slice remains reviewable and bounded\n\nSet context variables:\n- `pivotTriggered`\n- `pivotSeverity`\n- `pivotReturnPhase`\n- `slicePlanStale`\n- `validationFailed`\n\nIf drift or invalid assumptions are discovered, stop and return to planning deliberately rather than coding through it.",
+          "prompt": "Before implementing slice `{{currentSlice.name}}`, verify:\n- pivot triggers have not fired\n- the plan assumptions are still fresh enough\n- target files and symbols still match the plan\n- the slice remains reviewable and bounded\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `pivotTriggered`\n- `pivotSeverity`\n- `pivotReturnPhase`\n- `slicePlanStale`\n- `validationFailed`\n\nIf drift or invalid assumptions are discovered, stop and return to planning deliberately rather than coding through it.",
           "requireConfirmation": {
             "or": [
               { "var": "pivotTriggered", "equals": true },
@@ -210,7 +210,7 @@
         {
           "id": "phase-7c-verify-slice",
           "title": "Verify Slice",
-          "prompt": "Verify slice `{{currentSlice.name}}`.\n\nAlways:\n- run planned verification commands\n- update or add tests when needed\n- ensure invariants still hold\n- check philosophy-alignment regressions introduced by the implementation\n\nFresh-eye validation triggers:\n- if `specialCaseIntroduced = true`\n- if `unplannedAbstractionIntroduced = true`\n- if this slice touched unexpected files\n- if runtime behavior still feels uncertain\n\nMode-adaptive validation:\n- QUICK: self-verify unless a fresh-eye trigger fires\n- STANDARD: if delegation is available and any fresh-eye trigger fires, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, and optionally `routine-philosophy-alignment`\n- THOROUGH + high-risk or any fresh-eye trigger: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, `routine-plan-analysis`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ validators independently raise the same serious concern, treat it as blocking by default\n- if exactly one validator raises a concern, attempt to understand and resolve it before escalating\n- if validators disagree, record the disagreement explicitly and prefer the safer path when uncertainty remains high\n\nSet context variables:\n- `sliceVerified`\n- `verificationFindings`\n- `verificationFailed`\n- `verificationApprovalRequired`\n- `verificationRetried`\n- `verificationConcernCount`\n- `verificationConsensusLevel`\n\nRule:\n- if 2+ independent validators raise serious concerns, stop and return to planning or ask the user which path to take",
+          "prompt": "Verify slice `{{currentSlice.name}}`.\n\nAlways:\n- run planned verification commands\n- update or add tests when needed\n- ensure invariants still hold\n- check philosophy-alignment regressions introduced by the implementation\n\nFresh-eye validation triggers:\n- if `specialCaseIntroduced = true`\n- if `unplannedAbstractionIntroduced = true`\n- if this slice touched unexpected files\n- if runtime behavior still feels uncertain\n\nMode-adaptive validation:\n- QUICK: self-verify unless a fresh-eye trigger fires\n- STANDARD: if delegation is available and any fresh-eye trigger fires, spawn TWO or THREE WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, and optionally `routine-philosophy-alignment`\n- THOROUGH + high-risk or any fresh-eye trigger: if delegation is available, spawn FOUR WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation`, `routine-plan-analysis`, and `routine-philosophy-alignment`\n\nParallel-output synthesis rules:\n- if 2+ validators independently raise the same serious concern, treat it as blocking by default\n- if exactly one validator raises a concern, attempt to understand and resolve it before escalating\n- if validators disagree, record the disagreement explicitly and prefer the safer path when uncertainty remains high\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `sliceVerified`\n- `verificationFindings`\n- `verificationFailed`\n- `verificationApprovalRequired`\n- `verificationRetried`\n- `verificationConcernCount`\n- `verificationConsensusLevel`\n\nRule:\n- if 2+ independent validators raise serious concerns, stop and return to planning or ask the user which path to take",
           "requireConfirmation": {
             "or": [
               { "var": "verificationApprovalRequired", "equals": true },
@@ -221,7 +221,7 @@
         {
           "id": "phase-7d-drift-and-pr-gate",
           "title": "Drift and PR Gate",
-          "prompt": "After a verified slice:\n- compare actual changed scope against the slice plan\n- if the slice drifted, update `implementation_plan.md` immediately and record the reason in notes\n- if `prStrategy = MultiPR`, stop here with a concise PR package for user review before continuing\n\nSet context variables:\n- `planDrift`\n- `rulesDrift`\n- `changedFilesOutsidePlannedScope`\n- `scopeDriftDetected`\n\nRule:\n- do not rely on markdown sidecar state; notesMarkdown is the durable recap and `implementation_plan.md` is the human artifact",
+          "prompt": "After a verified slice:\n- compare actual changed scope against the slice plan\n- if the slice drifted, update `implementation_plan.md` immediately and record the reason in notes\n- if `prStrategy = MultiPR`, stop here with a concise PR package for user review before continuing\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `planDrift`\n- `rulesDrift`\n- `changedFilesOutsidePlannedScope`\n- `scopeDriftDetected`\n\nRule:\n- do not rely on markdown sidecar state; notesMarkdown is the durable recap and `implementation_plan.md` is the human artifact",
           "requireConfirmation": {
             "or": [
               { "var": "prStrategy", "equals": "MultiPR" },
@@ -239,7 +239,7 @@
         "var": "taskComplexity",
         "not_equals": "Small"
       },
-      "prompt": "Perform final integration verification.\n\nRequired:\n- verify acceptance criteria\n- map invariants to concrete proof (tests, build results, explicit reasoning)\n- run whole-task validation commands\n- identify any invariant violations or regressions\n- confirm the implemented result still aligns with the user's coding philosophy, naming any tensions explicitly\n- review cumulative drift across all slices, not just the current one\n- check whether repeated small compromises added up to a larger pattern problem\n\nSet context variables:\n- `integrationVerificationPassed`\n- `integrationVerificationFailed`\n- `integrationVerificationFindings`\n- `regressionDetected`\n- `invariantViolations`\n- `crossSliceDriftDetected`",
+      "prompt": "Perform final integration verification.\n\nRequired:\n- verify acceptance criteria\n- map invariants to concrete proof (tests, build results, explicit reasoning)\n- run whole-task validation commands\n- identify any invariant violations or regressions\n- confirm the implemented result still aligns with the user's coding philosophy, naming any tensions explicitly\n- review cumulative drift across all slices, not just the current one\n- check whether repeated small compromises added up to a larger pattern problem\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `integrationVerificationPassed`\n- `integrationVerificationFailed`\n- `integrationVerificationFindings`\n- `regressionDetected`\n- `invariantViolations`\n- `crossSliceDriftDetected`",
       "requireConfirmation": {
         "or": [
           { "var": "integrationVerificationFailed", "equals": true },

package/workflows/coding-task-workflow-with-loops.json CHANGED Viewed

@@ -16,7 +16,7 @@
         "fun createFile(filename) = 'Use edit_file to create/update {filename}. NEVER output full content in chat—only summarize. If fails, request user help & log command.'",
         "fun applyUserRules() = 'Apply & reference user-defined rules, patterns & preferences. Document alignment in Decision Log. Explain rule influence in decisions.'",
         "fun matchPatterns() = 'Use codebase_search/grep to find similar patterns. Reference Decision Log patterns. Match target area unless user rules override.'",
-        "fun addResumptionJson(phase) = 'Update CONTEXT.md resumption section with: 1) workflow_get instructions (id: coding-task-workflow-with-loops, mode: preview), 2) workflow_next JSON with workflowId, completedSteps up to {phase}, all context variables.'",
+        "fun addResumptionJson(phase) = 'Update CONTEXT.md with: 1) workflow_get (id: coding-task-workflow-with-loops, mode: preview), 2) workflow_next JSON with workflowId, completedSteps up to {phase}, all continue_workflow context keys.'",
         "fun gitCommit(type, msg) = 'If git available: commit with {type}: {msg}. If unavailable: log in CONTEXT.md with timestamp.'",
         "fun verifyImplementation() = '1) Test coverage >80%, 2) Run full test suite, 3) Self-review. Max 2 attempts before failure protocol.'",
         "fun checkAutomation(action) = 'High: auto-{action} if confidence >8. Medium: request confirmation. Low: extra confirmations.'",
@@ -57,7 +57,7 @@
                 "Consider both technical complexity and business risk",
                 "When in doubt, err on the side of more thorough analysis (higher complexity)",
                 "Always allow human override of your classification",
-                "Set context variables that will be used for conditional step execution and automation",
+                "Set these keys in the next `continue_workflow` call's `context` object that will be used for conditional step execution and automation",
                 "Automation levels: High=auto-approve confidence >8, Medium=standard, Low=extra confirmations"
             ],
             "requireConfirmation": true

package/workflows/cross-platform-code-conversion.v2.json ADDED Viewed

@@ -0,0 +1,199 @@
+{
+  "id": "cross-platform-code-conversion",
+  "name": "Cross-Platform Code Conversion",
+  "version": "0.1.0",
+  "description": "Guides an agent through converting code from one platform to another (e.g., Android to iOS, iOS to Web). Triages files by difficulty, delegates easy literal translations to parallel subagents, then the main agent tackles platform-specific code requiring design decisions.",
+  "recommendedPreferences": {
+    "recommendedAutonomy": "guided",
+    "recommendedRiskPolicy": "conservative"
+  },
+  "features": [
+    "wr.features.subagent_guidance",
+    "wr.features.memory_context"
+  ],
+  "preconditions": [
+    "User specifies source and target platforms.",
+    "Agent has read access to the source codebase.",
+    "Agent has write access to create target-platform files.",
+    "Agent can run build or typecheck tools for the target platform."
+  ],
+  "metaGuidance": [
+    "IDIOMATIC CONVERSION: translate patterns and idioms, not syntax. A Kotlin sealed class becomes a Swift enum with associated values, not a class hierarchy workaround.",
+    "SCOPE DISCIPLINE: convert only what the user scoped. Do not expand to adjacent features or modules unless explicitly asked.",
+    "DEPENDENCY MAPPING: never assume a library exists on the target platform. Map each dependency to its target equivalent or flag it as needing a custom solution.",
+    "PLATFORM CONVENTIONS: follow the target platform's conventions for project structure, naming, error handling, concurrency, and testing.",
+    "BUILD PROOF: code that does not build is not done. Run build or typecheck after every conversion batch.",
+    "PRESERVE INTENT: the goal is functional equivalence, not line-by-line correspondence. Restructure when the target platform has a better way.",
+    "TRIAGE FIRST: not all code is equal. Separate trivial translations from code needing real design decisions. Delegate the easy stuff, focus on the hard stuff.",
+    "TARGET REPO DISCOVERY: find the target repo yourself before asking. Check workspace roots, sibling dirs, monorepo modules, and agent config files first.",
+    "PERSIST REPO MAPPINGS: once a target repo is confirmed, offer to save the source-to-target mapping in the source repo's agent config so future runs skip discovery.",
+    "DRIFT DETECTION: if a file turns out harder than its bucket classification during conversion, stop and reclassify it. Do not silently absorb complexity."
+  ],
+  "steps": [
+    {
+      "id": "phase-0-scope",
+      "title": "Phase 0: Scope & Platform Analysis",
+      "prompt": "Understand what you're converting before you touch anything.\n\nFigure out:\n- What is being converted? (single file, module, feature, full component, entire app)\n- What is the source platform? (Android/Kotlin, iOS/Swift, Web/React, etc.)\n- What is the target platform?\n- How large is the conversion scope? (file count, rough LOC)\n- Where does the converted code go? Find the target repo yourself before asking the user.\n\nIf the user hasn't specified scope boundaries, ask. Don't guess at scope.\n\nThen classify the conversion:\n- `conversionComplexity`: Small (1-3 files, straightforward translation) / Medium (a module or feature, mixed difficulty) / Large (many modules, significant platform-specific code)\n\nUse this guidance:\n- Small: few files, mostly mechanical, low risk of idiom mismatch\n- Medium: a module or feature with some platform-specific code mixed in\n- Large: many files, deep platform coupling, multiple idiom mapping decisions\n\nCapture:\n- `sourcePlatform`\n- `targetPlatform`\n- `conversionScope`\n- `targetRepoPath`\n- `estimatedSize`\n- `conversionComplexity`",
+      "requireConfirmation": {
+        "var": "conversionComplexity",
+        "not_equals": "Small"
+      }
+    },
+    {
+      "id": "phase-1-understand-source",
+      "title": "Phase 1: Understand Source Code",
+      "prompt": "Read and analyze the source code through a conversion lens — what will be easy to convert, what will be hard, and why.\n\nMap out:\n- Architecture and module structure\n- Key patterns used (MVI, MVVM, dependency injection, etc.)\n- External dependencies and what they do\n- Entry points and public API surface\n- Platform coupling depth: is the code cleanly layered or is platform-specific code smeared throughout? This directly determines how much falls into easy vs. hard buckets.\n- Concurrency model: Coroutines, Combine, RxJS, async/await? This is often the single hardest mapping decision.\n- DI approach: Dagger/Hilt, Swinject, Koin? DI frameworks rarely map 1:1.\n- Test coverage shape: unit tests on business logic (convert easily), UI tests (likely rewrite), integration tests (depends on infra).\n- Shared code boundaries: is there already a shared/common module that might not need conversion at all?\n\nCapture:\n- `sourceArchitecture`\n- `dependencies`\n- `publicApiSurface`\n- `platformCouplingAssessment`\n- `concurrencyModel`\n- `testCoverageShape`",
+      "promptFragments": [
+        {
+          "id": "phase-1-small-light",
+          "when": { "var": "conversionComplexity", "equals": "Small" },
+          "text": "For Small conversions, keep this lightweight. A quick read of the files in scope is enough — don't map the entire architecture. Focus on identifying any platform-specific code that would prevent a straight translation."
+        }
+      ],
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-small-fast-path",
+      "title": "Small Conversion Fast Path",
+      "runCondition": {
+        "var": "conversionComplexity",
+        "equals": "Small"
+      },
+      "prompt": "For Small conversions, skip triage and planning — just convert.\n\n- Translate the files to the target platform idiomatically\n- Follow target platform naming and structure conventions\n- Map any dependencies to target equivalents\n- Convert tests if they exist\n- Run build or typecheck to verify\n\nIf something turns out harder than expected (deep platform coupling, no clean dependency equivalent), update `conversionComplexity` to `Medium` and stop. The full triage and planning pipeline will activate for the remaining work.\n\nCapture:\n- `filesConverted`\n- `buildPassed`\n- `conversionComplexity`",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-2-triage",
+      "title": "Phase 2: Triage & Sort",
+      "runCondition": {
+        "var": "conversionComplexity",
+        "not_equals": "Small"
+      },
+      "prompt": "Classify every file or module in scope into one of three buckets:\n\n**Bucket A — Literal translation**: Platform-agnostic business logic, data models, utilities, pure functions. These use no platform-specific APIs or libraries. Conversion is mechanical: translate the language syntax, follow target naming conventions, done. These will be delegated to subagents.\n\n**Bucket B — Library substitution**: Code that uses platform-specific libraries (networking, persistence, serialization, DI) but follows standard patterns. These need dependency mapping but the structure stays the same.\n\n**Bucket C — Platform-specific**: Code deeply tied to the platform (UI layer, lifecycle management, concurrency/threading, navigation, platform APIs). These need design decisions about target-platform idioms.\n\nFor each file or module, list:\n- File/module name\n- Bucket (A, B, or C)\n- One-line reason for classification\n- Dependencies it has on other files in scope (so we know conversion order)\n\nSort the work items within each bucket by dependency order (convert dependencies first).\n\nGroup Bucket A files into parallel batches of 3-5 files each. Each batch should contain files with no cross-dependencies so subagents can work independently.\n\nGroup Bucket B and C files into sequential batches by dependency order.\n\nEach batch should have: `name` (short label), `bucket` (A, B, or C), and `files` (list of file paths).\n\nCapture:\n- `bucketABatches` (parallel batches for subagent delegation)\n- `bucketBCBatches` (sequential batches for main agent)\n- `bucketACounts`\n- `bucketBCounts`\n- `bucketCCounts`",
+      "requireConfirmation": true
+    },
+    {
+      "id": "phase-3-plan-hard-items",
+      "title": "Phase 3: Plan Platform-Specific Conversions",
+      "runCondition": {
+        "var": "conversionComplexity",
+        "not_equals": "Small"
+      },
+      "prompt": "For Bucket B and Bucket C items, plan the conversion before writing code.\n\nFor Bucket B (library substitution):\n- Map each source dependency to its target-platform equivalent\n- If no equivalent exists, flag it and propose an alternative\n\nFor Bucket C (platform-specific):\n- Threading/concurrency model mapping\n- UI framework mapping\n- DI framework mapping\n- State management mapping\n- Error handling mapping\n- Navigation patterns\n- Lifecycle management approach\n- Testing framework mapping\n\nFor anything with no clean target equivalent, propose an idiomatic solution and explain the tradeoff.\n\nBucket A items don't need a plan. They're mechanical translation handled by subagents.\n\nCapture:\n- `idiomMapping`\n- `dependencyMapping`\n- `tradeoffs`",
+      "promptFragments": [
+        {
+          "id": "phase-3-medium-focused",
+          "when": { "var": "conversionComplexity", "equals": "Medium" },
+          "text": "For Medium conversions, focus the plan on the items that actually need design decisions. Don't exhaustively map every dimension — only the ones relevant to the files in scope."
+        }
+      ],
+      "requireConfirmation": true
+    },
+    {
+      "id": "phase-4-delegate-bucket-a",
+      "title": "Phase 4: Delegate Bucket A (Parallel Subagents)",
+      "runCondition": {
+        "and": [
+          { "var": "conversionComplexity", "not_equals": "Small" },
+          { "var": "bucketACounts", "not_equals": 0 }
+        ]
+      },
+      "prompt": "Delegate all Bucket A batches to subagents in parallel. If subagent delegation is not available in your environment, convert Bucket A files yourself sequentially — they're mechanical translations.\n\nFor each batch in `bucketABatches`, spawn a subagent with these instructions:\n- Source platform: `{{sourcePlatform}}`\n- Target platform: `{{targetPlatform}}`\n- Target repo: `{{targetRepoPath}}`\n- Files to convert: (list the specific files in this batch)\n- Task: translate these files from the source language to the target language. Follow target platform naming conventions. These are platform-agnostic files — no library substitution or idiom mapping needed. Preserve the public API. Convert tests if they exist.\n\nRun batches in parallel. Each subagent works independently on files with no cross-dependencies.\n\nWhen all subagents finish, review their output:\n- Spot-check a few converted files for quality\n- Flag any files a subagent misclassified as easy (actually needs library substitution or platform-specific handling)\n- Move misclassified files back to the appropriate bucket for main agent handling\n\nRun build or typecheck on the Bucket A output to catch issues early.\n\nCapture:\n- `bucketAComplete`\n- `bucketABuildPassed`\n- `reclassifiedFiles`",
+      "requireConfirmation": false
+    },
+    {
+      "id": "phase-5-convert-hard",
+      "type": "loop",
+      "title": "Phase 5: Convert Bucket B & C (Main Agent)",
+      "runCondition": {
+        "var": "conversionComplexity",
+        "not_equals": "Small"
+      },
+      "loop": {
+        "type": "forEach",
+        "items": "bucketBCBatches",
+        "itemVar": "currentBatch",
+        "indexVar": "batchIndex",
+        "maxIterations": 30
+      },
+      "body": [
+        {
+          "id": "phase-5a-convert-batch",
+          "title": "Convert Current Batch",
+          "prompt": "Convert the current batch: `{{currentBatch.name}}`\n\nThis is Bucket B or C code that needs your full context.\n\n- **Bucket B**: Follow the dependency mapping from Phase 3. Substitute libraries, keep structure.\n- **Bucket C**: Follow the idiom mapping from Phase 3. Restructure where the target platform has a better way.\n\nAlso handle any `reclassifiedFiles` that were moved back from Bucket A delegation.\n\nFor all files:\n- Follow target platform conventions\n- Preserve public API contracts where possible\n- Add TODO comments for anything uncertain\n- Convert tests alongside production code when source tests exist\n\nRun build or typecheck after this batch. If it fails, fix it before moving on.\n\nTrack whether this batch required:\n- `bucketDriftDetected`: a file turned out to be harder than its bucket classification (e.g., Bucket B file needed Bucket C-level design decisions)\n- `unexpectedDependency`: a dependency was discovered that wasn't in the Phase 3 mapping\n- `buildBroke`: build or typecheck failed after this batch\n\nCapture:\n- `batchFilesConverted`\n- `batchBuildPassed`\n- `batchIssues`\n- `bucketDriftDetected`\n- `unexpectedDependency`\n- `buildBroke`",
+          "requireConfirmation": false
+        },
+        {
+          "id": "phase-5b-verify-batch",
+          "title": "Verify Batch",
+          "runCondition": {
+            "or": [
+              { "var": "bucketDriftDetected", "equals": true },
+              { "var": "unexpectedDependency", "equals": true },
+              { "var": "buildBroke", "equals": true }
+            ]
+          },
+          "prompt": "Something unexpected happened in this batch. Before moving on, check what went wrong.\n\nIf `bucketDriftDetected`: the file was harder than classified. Update the idiom or dependency mapping from Phase 3 so downstream batches don't hit the same surprise. Record what changed.\n\nIf `unexpectedDependency`: a dependency wasn't in the Phase 3 plan. Map it now and check whether other batches depend on the same thing.\n\nIf `buildBroke`: diagnose whether the failure is local to this batch or a cross-batch integration issue. Fix it before continuing.\n\nIf the drift is severe enough that the Phase 3 plan is unreliable, say so. Don't silently absorb complexity.\n\nCapture:\n- `mappingUpdated`\n- `verificationPassed`",
+          "requireConfirmation": {
+            "var": "bucketDriftDetected",
+            "equals": true
+          }
+        }
+      ]
+    },
+    {
+      "id": "phase-6-verify",
+      "type": "loop",
+      "title": "Phase 6: Final Verification",
+      "runCondition": {
+        "var": "conversionComplexity",
+        "not_equals": "Small"
+      },
+      "loop": {
+        "type": "while",
+        "conditionSource": {
+          "kind": "artifact_contract",
+          "contractRef": "wr.contracts.loop_control",
+          "loopId": "final_verification_loop"
+        },
+        "maxIterations": 3
+      },
+      "body": [
+        {
+          "id": "phase-6a-full-build",
+          "title": "Full Build and Integration Check",
+          "prompt": "Run a full build or typecheck on the entire converted codebase — both subagent-converted and main-agent-converted code together.\n\nCheck for:\n- Build/compile errors from cross-batch integration issues\n- Inconsistencies between subagent output and main agent output (naming, patterns)\n- Non-idiomatic patterns that slipped through\n- Missing error handling at module boundaries\n- Threading or concurrency issues across modules\n- Broken public API contracts\n\nFix each issue. If a fix is a band-aid over a deeper mapping problem, go back and fix the mapping.\n\nCapture:\n- `fullBuildPassed`\n- `integrationIssues`\n- `issuesFixed`",
+          "requireConfirmation": false
+        },
+        {
+          "id": "phase-6b-loop-decision",
+          "title": "Verification Loop Decision",
+          "prompt": "Decide whether verification needs another pass.\n\n- If the build fails or critical integration issues remain: continue.\n- If the build passes and remaining issues are minor: stop.\n- If you've hit the iteration limit: stop and record what remains.\n\nEmit the loop-control artifact:\n```json\n{\n  \"artifacts\": [{\n    \"kind\": \"wr.loop_control\",\n    \"decision\": \"continue or stop\"\n  }]\n}\n```",
+          "requireConfirmation": false,
+          "outputContract": {
+            "contractRef": "wr.contracts.loop_control"
+          }
+        }
+      ]
+    },
+    {
+      "id": "phase-7-handoff",
+      "title": "Phase 7: Handoff",
+      "prompt": "Summarize what was converted.\n\nInclude:\n- Source and target platforms\n- Total files converted\n- Build/typecheck status\n- Known gaps, TODOs, or limitations\n- What would need manual attention\n\nKeep it concise. The converted code is the deliverable.",
+      "promptFragments": [
+        {
+          "id": "phase-7-small-summary",
+          "when": { "var": "conversionComplexity", "equals": "Small" },
+          "text": "For Small conversions, keep the summary brief — just list what was converted, build status, and any issues."
+        },
+        {
+          "id": "phase-7-full-summary",
+          "when": { "var": "conversionComplexity", "not_equals": "Small" },
+          "text": "Also include: bucket breakdown (A/B/C counts), delegation results (how many files delegated, subagent quality, any reclassified), key idiom mapping decisions, and dependency substitutions."
+        }
+      ],
+      "notesOptional": true,
+      "requireConfirmation": false
+    }
+  ]
+}

package/workflows/document-creation-workflow.json CHANGED Viewed

@@ -35,7 +35,7 @@
       "agentRole": "You are a documentation strategy specialist with expertise in assessing documentation complexity and risk. Your role is to accurately classify documentation needs based on technical depth, stakeholder impact, and integration requirements.",
       "guidance": [
         "Consider both content complexity and organizational impact",
-        "Set context variables that drive conditional workflow execution",
+        "Set these keys in the next `continue_workflow` call's `context` object that drive conditional workflow execution",
         "When uncertain, err toward higher complexity for better quality",
         "Automation levels: High=auto-approve confidence >8, Medium=standard, Low=extra confirmations"
       ],

package/workflows/exploration-workflow.json CHANGED Viewed

@@ -14,7 +14,7 @@
     "preconditions": [
         "User has a clear task, problem, or question to explore",
         "User can provide initial context, constraints, or requirements",
-        "Agent can maintain context variables throughout the workflow"
+        "Agent can maintain `continue_workflow` context keys throughout the workflow"
     ],
     "metaGuidance": [
         "FUNCTION DEFINITIONS: fun trackEvidence(source, grade) = 'Add to context.evidenceLog[] with {source, grade, timestamp}. Grade: High (peer-reviewed/official), Medium (expert/established), Low (anecdotal/emerging)'",
@@ -45,7 +45,7 @@
                 "Consider both domain complexity and option space size",
                 "When in doubt, err on the side of more thorough analysis (higher complexity)",
                 "Always allow human override of classification",
-                "Set context variables for conditional step execution and automation",
+                "Set these keys in the next `continue_workflow` call's `context` object for conditional step execution and automation",
                 "Automation levels: High=auto-approve confidence >8, Medium=standard, Low=extra confirmations"
             ],
             "requireConfirmation": true
@@ -72,7 +72,7 @@
                 "Some tasks may span domains - choose primary domain",
                 "This classification affects tool selection and evaluation criteria",
                 "Document reasoning for domain choice",
-                "Set domain-specific context variables for later steps"
+                "Set domain-specific keys in the next `continue_workflow` call's `context` object for later steps"
             ],
             "requireConfirmation": false
         },

package/workflows/mr-review-workflow.agentic.v2.json CHANGED Viewed

@@ -22,8 +22,8 @@
   ],
   "metaGuidance": [
     "DEFAULT BEHAVIOR: self-execute with tools. Only ask for missing external artifacts, permissions, or business context you cannot resolve yourself.",
-    "V2 DURABILITY: use output.notesMarkdown and explicit context variables as durable workflow state. Do NOT rely on the live review document as required workflow memory.",
-    "ARTIFACT STRATEGY: `reviewDocPath` is a human-facing artifact only. Keep it updated for readability, but keep execution truth in notes/context variables.",
+    "V2 DURABILITY: use output.notesMarkdown and explicit `continue_workflow` context keys as durable workflow state. Do NOT rely on the live review document as required workflow memory.",
+    "ARTIFACT STRATEGY: `reviewDocPath` is a human-facing artifact only. Keep it updated for readability, but keep execution truth in notes/`continue_workflow` context.",
     "MAIN AGENT OWNS REVIEW: the main agent owns truth, synthesis, severity calibration, final recommendation, and document finalization.",
     "SUBAGENT MODEL: use the WorkRail Executor only. Do not refer to Builder, Researcher, or other named subagent identities.",
     "PARALLELISM: parallelize independent cognition; serialize synthesis, canonical review findings, recommendation decisions, and final document writes.",
@@ -39,7 +39,7 @@
     {
       "id": "phase-0-triage-and-mode",
       "title": "Phase 0: Triage (MR Context • Risk • Mode)",
-      "prompt": "Understand the MR and choose the right rigor.\n\nCapture:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched` (true/false)\n- `reviewMode`: QUICK / STANDARD / THOROUGH\n- `riskLevel`: Low / Medium / High\n- `maxParallelism`: 0 / 3 / 5\n\nDecision guidance:\n- QUICK: very small, isolated, low-risk changes with little ambiguity\n- STANDARD: typical feature or bug-fix reviews with moderate ambiguity or moderate risk\n- THOROUGH: critical surfaces, architectural novelty, high risk, broad change sets, or strong need for independent reviewer perspectives\n\nAlso choose `reviewDocPath` for the human-facing live artifact. Default suggestion: `mr-review.md` at the project root.\n\nSet context variables:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched`\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `reviewDocPath`\n\nAsk for confirmation only if the selected mode materially changes expectations or if the diff/source context is still missing.",
+      "prompt": "Understand the MR and choose the right rigor.\n\nCapture:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched` (true/false)\n- `reviewMode`: QUICK / STANDARD / THOROUGH\n- `riskLevel`: Low / Medium / High\n- `maxParallelism`: 0 / 3 / 5\n\nDecision guidance:\n- QUICK: very small, isolated, low-risk changes with little ambiguity\n- STANDARD: typical feature or bug-fix reviews with moderate ambiguity or moderate risk\n- THOROUGH: critical surfaces, architectural novelty, high risk, broad change sets, or strong need for independent reviewer perspectives\n\nAlso choose `reviewDocPath` for the human-facing live artifact. Default suggestion: `mr-review.md` at the project root.\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `mrTitle`\n- `mrPurpose`\n- `ticketContext`\n- `focusAreas`\n- `changedFileCount`\n- `criticalSurfaceTouched`\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `reviewDocPath`\n\nAsk for confirmation only if the selected mode materially changes expectations or if the diff/source context is still missing.",
       "requireConfirmation": true
     },
     {
@@ -63,11 +63,11 @@
             { "kind": "ref", "refId": "wr.refs.notes_first_durability" }
           ],
           "Do the main context work yourself using tools.",
-          "Keep `reviewDocPath` updated for human readability, but keep execution truth in notes/context variables."
+          "Keep `reviewDocPath` updated for human readability, but keep execution truth in notes/`continue_workflow` context."
         ],
         "procedure": [
           "Produce a concise MR summary and intended behavior change, changed files overview, module or subsystem neighborhood, bounded call graph / public contracts / impacted consumers where relevant, repo patterns that matter for this review, and explicit unknowns / likely blind spots.",
-          "Set context variables: `contextSummary`, `candidateFiles`, `moduleRoots`, `contextUnknownCount`, `coverageGapCount`, `authorIntentUnclear`, `retriageNeeded`.",
+          "Set these keys in the next `continue_workflow` call's `context` object: `contextSummary`, `candidateFiles`, `moduleRoots`, `contextUnknownCount`, `coverageGapCount`, `authorIntentUnclear`, `retriageNeeded`.",
           "Compute `contextUnknownCount` as unresolved technical unknowns that materially affect review quality.",
           "Compute `coverageGapCount` as likely review angles or code areas still insufficiently understood.",
           "Set `retriageNeeded = true` if the real risk or surface area is larger than Phase 0 suggested.",
@@ -88,7 +88,7 @@
         "var": "retriageNeeded",
         "equals": true
       },
-      "prompt": "Reassess the review mode now that the real code context is known.\n\nReview:\n- `contextUnknownCount`\n- `coverageGapCount`\n- actual systems/components involved\n- whether `criticalSurfaceTouched` is still accurate\n- whether runtime or production simulation now looks necessary\n\nDo:\n- confirm or adjust `reviewMode`\n- confirm or adjust `riskLevel`\n- confirm or adjust `maxParallelism`\n- set `needsSimulation` to true or false\n- set `retriageChanged`\n\nEscalation rules:\n- QUICK may escalate to STANDARD if `criticalSurfaceTouched = true` or `contextUnknownCount > 0`\n- STANDARD may escalate to THOROUGH if `criticalSurfaceTouched = true` and risk is High, or if multiple unresolved context gaps remain\n\nSet context variables:\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `needsSimulation`\n- `retriageChanged`",
+      "prompt": "Reassess the review mode now that the real code context is known.\n\nReview:\n- `contextUnknownCount`\n- `coverageGapCount`\n- actual systems/components involved\n- whether `criticalSurfaceTouched` is still accurate\n- whether runtime or production simulation now looks necessary\n\nDo:\n- confirm or adjust `reviewMode`\n- confirm or adjust `riskLevel`\n- confirm or adjust `maxParallelism`\n- set `needsSimulation` to true or false\n- set `retriageChanged`\n\nEscalation rules:\n- QUICK may escalate to STANDARD if `criticalSurfaceTouched = true` or `contextUnknownCount > 0`\n- STANDARD may escalate to THOROUGH if `criticalSurfaceTouched = true` and risk is High, or if multiple unresolved context gaps remain\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `reviewMode`\n- `riskLevel`\n- `maxParallelism`\n- `needsSimulation`\n- `retriageChanged`",
       "requireConfirmation": {
         "or": [
           { "var": "retriageChanged", "equals": true },
@@ -99,7 +99,7 @@
     {
       "id": "phase-2-fact-packet-and-family-selection",
       "title": "Phase 2: Freeze Fact Packet and Select Reviewer Families",
-      "prompt": "Freeze the shared factual basis that all downstream reviewer families must use, then choose the reviewer families from that same phase.\n\nCreate a neutral `reviewFactPacket` containing:\n- MR purpose and expected behavior change\n- changed files and module roots\n- key contracts, invariants, and affected consumers\n- call graph highlights or execution touchpoints\n- relevant repo patterns and exemplars\n- tests/docs expectations\n- explicit open unknowns\n\nInitialize `coverageLedger` with these domains, each marked as `checked`, `uncertain`, `not_applicable`, `contradicted`, or `needs_followup`:\n- correctness_logic\n- contracts_invariants\n- patterns_architecture\n- runtime_production_risk\n- tests_docs_rollout\n- security_performance\n\nThen perform a preliminary review from the shared fact packet and choose reviewer families.\n\nReviewer family options:\n- `correctness_invariants`\n- `patterns_architecture`\n- `runtime_production_risk`\n- `test_docs_rollout`\n- `false_positive_skeptic`\n- `missed_issue_hunter`\n\nSelection guidance:\n- QUICK: no family bundle by default; add `false_positive_skeptic` only if a supposedly easy review still feels risky or ambiguous\n- STANDARD: run 3 families by default\n- THOROUGH: run 5 families by default\n- always include `correctness_invariants` unless clearly not applicable\n- always include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable\n- include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`\n- include `missed_issue_hunter` in THOROUGH mode\n- include `false_positive_skeptic` whenever Major/Critical findings are likely, the change is controversial, or severity inflation risk is non-trivial\n\nAnti-anchoring rule:\n- reviewer families must treat `reviewFactPacket` as primary truth\n- `recommendationHypothesis` is optional secondary context only; it must not become the frame every family simply validates\n\nCoverage ledger rules:\n- use `contradicted` when evidence materially conflicts across reviewer families and the disagreement is unresolved\n- use `needs_followup` when the domain is relevant and additional targeted work is still required\n- use `uncertain` only for bounded ambiguity where no direct contradiction exists yet\n- compute `coverageUncertainCount` as the count of coverage domains not yet safely closed: `uncertain` + `contradicted` + `needs_followup`\n\nDefault reviewer-bundle rule:\n- QUICK: `needsReviewerBundle = false` unless a trigger or risk signal clearly justifies it\n- STANDARD / THOROUGH: `needsReviewerBundle = true` by default unless the review is materially simpler than expected\n\nSet context variables:\n- `reviewFactPacket`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `preliminaryFindings`\n- `recommendationHypothesis`\n- `reviewFamiliesSelected`\n- `needsReviewerBundle`",
+      "prompt": "Freeze the shared factual basis that all downstream reviewer families must use, then choose the reviewer families from that same phase.\n\nCreate a neutral `reviewFactPacket` containing:\n- MR purpose and expected behavior change\n- changed files and module roots\n- key contracts, invariants, and affected consumers\n- call graph highlights or execution touchpoints\n- relevant repo patterns and exemplars\n- tests/docs expectations\n- explicit open unknowns\n\nInitialize `coverageLedger` with these domains, each marked as `checked`, `uncertain`, `not_applicable`, `contradicted`, or `needs_followup`:\n- correctness_logic\n- contracts_invariants\n- patterns_architecture\n- runtime_production_risk\n- tests_docs_rollout\n- security_performance\n\nThen perform a preliminary review from the shared fact packet and choose reviewer families.\n\nReviewer family options:\n- `correctness_invariants`\n- `patterns_architecture`\n- `runtime_production_risk`\n- `test_docs_rollout`\n- `false_positive_skeptic`\n- `missed_issue_hunter`\n\nSelection guidance:\n- QUICK: no family bundle by default; add `false_positive_skeptic` only if a supposedly easy review still feels risky or ambiguous\n- STANDARD: run 3 families by default\n- THOROUGH: run 5 families by default\n- always include `correctness_invariants` unless clearly not applicable\n- always include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable\n- include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`\n- include `missed_issue_hunter` in THOROUGH mode\n- include `false_positive_skeptic` whenever Major/Critical findings are likely, the change is controversial, or severity inflation risk is non-trivial\n\nAnti-anchoring rule:\n- reviewer families must treat `reviewFactPacket` as primary truth\n- `recommendationHypothesis` is optional secondary context only; it must not become the frame every family simply validates\n\nCoverage ledger rules:\n- use `contradicted` when evidence materially conflicts across reviewer families and the disagreement is unresolved\n- use `needs_followup` when the domain is relevant and additional targeted work is still required\n- use `uncertain` only for bounded ambiguity where no direct contradiction exists yet\n- compute `coverageUncertainCount` as the count of coverage domains not yet safely closed: `uncertain` + `contradicted` + `needs_followup`\n\nDefault reviewer-bundle rule:\n- QUICK: `needsReviewerBundle = false` unless a trigger or risk signal clearly justifies it\n- STANDARD / THOROUGH: `needsReviewerBundle = true` by default unless the review is materially simpler than expected\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `reviewFactPacket`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `preliminaryFindings`\n- `recommendationHypothesis`\n- `reviewFamiliesSelected`\n- `needsReviewerBundle`",
       "requireConfirmation": false
     },
     {
@@ -126,14 +126,14 @@
           "Each reviewer family must return: key findings, severity estimates, confidence level, top risks, recommendation, and what others may have missed.",
           "Family missions: `correctness_invariants` = logic, correctness, API and invariant risks; `patterns_architecture` = pattern fit, design consistency, architectural concerns; `runtime_production_risk` = runtime behavior, production impact, performance/state-flow risk; `test_docs_rollout` = test adequacy, docs, migration, rollout, affected consumers; `false_positive_skeptic` = challenge likely overreaches, weak evidence, or severity inflation; `missed_issue_hunter` = search for an important category of issue the others may miss.",
           "Mode-adaptive parallelism: STANDARD = spawn THREE WorkRail Executors SIMULTANEOUSLY for the selected families; THOROUGH = spawn FIVE WorkRail Executors SIMULTANEOUSLY for the selected families.",
-          "Set context variables: `familyFindingsSummary`, `familyRecommendationSpread`, `contradictionCount`, `blindSpotCount`, `falsePositiveRiskCount`, `needsSimulation`.",
+          "Set these keys in the next `continue_workflow` call's `context` object: `familyFindingsSummary`, `familyRecommendationSpread`, `contradictionCount`, `blindSpotCount`, `falsePositiveRiskCount`, `needsSimulation`.",
           "Compute `contradictionCount` as material disagreements across reviewer families about issue validity, severity, or final recommendation.",
           "Increase `blindSpotCount` if the missed-issue hunter or any other family identifies uncovered review space.",
           "Increase `falsePositiveRiskCount` when the skeptic materially weakens one or more high-severity findings."
         ],
         "verify": [
           "The same fact packet was used as primary truth across reviewer families.",
-          "Contradictions, blind spots, and false-positive risks are all reflected structurally in context variables.",
+          "Contradictions, blind spots, and false-positive risks are all reflected structurally in the `continue_workflow` context object.",
           "Parallel reviewer outputs are not treated as self-finalizing; the main agent still owns synthesis."
         ]
       },
@@ -182,7 +182,7 @@
         {
           "id": "phase-4b-canonical-synthesis",
           "title": "Canonical Synthesis and Coverage Update",
-      "prompt": "Synthesize all reviewer-family outputs and any targeted follow-up into one canonical review state.\n\nSynthesis decision table:\n- if 2+ reviewer families flag the same serious issue with the same severity, treat it as validated\n- if the same issue is flagged with different severities, default to the higher severity unless the lower-severity position includes specific counter-evidence\n- if one family flags an issue and others are silent, investigate it but do not automatically block unless it is clearly critical or security-sensitive\n- if one family says false positive and another says valid issue, require explicit main-agent adjudication in notes before finalization\n- if recommendation spread shows material disagreement, findings override recommendation until reconciled\n- if simulation reveals a new production risk, add a new finding and re-evaluate recommendation confidence\n\nCoverage ledger rules:\n- move a domain from `uncertain` to `checked` only when the evidence is materially adequate\n- keep a domain `uncertain` if disagreement or missing evidence still materially affects recommendation quality\n- mark `not_applicable` only when the MR genuinely does not engage that dimension\n- clear `contradicted` only when the contradiction is explicitly resolved by evidence or adjudication\n- clear `needs_followup` only when the required targeted follow-up has actually been completed or the domain is explicitly downgraded as non-material\n\nRecommendation confidence rules:\n- set `recommendationConfidenceBand = High` only if no unresolved material contradictions remain, no important coverage domains remain uncertain, false-positive risk is not material, and consensus is strong enough for the current mode\n- set `recommendationConfidenceBand = Medium` when one bounded uncertainty remains but the recommendation is still directionally justified\n- set `recommendationConfidenceBand = Low` when multiple viable interpretations remain, major contradictions are unresolved, or important coverage gaps still weaken the recommendation\n\nSet context variables:\n- `reviewFindings`\n- `criticalFindingsCount`\n- `majorFindingsCount`\n- `minorFindingsCount`\n- `nitFindingsCount`\n- `recommendation`\n- `recommendationConfidenceBand`\n- `recommendationDriftDetected`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `docCompletenessConcernCount`\n\nUpdate `reviewDocPath` so the human artifact matches the canonical review state.",
+      "prompt": "Synthesize all reviewer-family outputs and any targeted follow-up into one canonical review state.\n\nSynthesis decision table:\n- if 2+ reviewer families flag the same serious issue with the same severity, treat it as validated\n- if the same issue is flagged with different severities, default to the higher severity unless the lower-severity position includes specific counter-evidence\n- if one family flags an issue and others are silent, investigate it but do not automatically block unless it is clearly critical or security-sensitive\n- if one family says false positive and another says valid issue, require explicit main-agent adjudication in notes before finalization\n- if recommendation spread shows material disagreement, findings override recommendation until reconciled\n- if simulation reveals a new production risk, add a new finding and re-evaluate recommendation confidence\n\nCoverage ledger rules:\n- move a domain from `uncertain` to `checked` only when the evidence is materially adequate\n- keep a domain `uncertain` if disagreement or missing evidence still materially affects recommendation quality\n- mark `not_applicable` only when the MR genuinely does not engage that dimension\n- clear `contradicted` only when the contradiction is explicitly resolved by evidence or adjudication\n- clear `needs_followup` only when the required targeted follow-up has actually been completed or the domain is explicitly downgraded as non-material\n\nRecommendation confidence rules:\n- set `recommendationConfidenceBand = High` only if no unresolved material contradictions remain, no important coverage domains remain uncertain, false-positive risk is not material, and consensus is strong enough for the current mode\n- set `recommendationConfidenceBand = Medium` when one bounded uncertainty remains but the recommendation is still directionally justified\n- set `recommendationConfidenceBand = Low` when multiple viable interpretations remain, major contradictions are unresolved, or important coverage gaps still weaken the recommendation\n\nSet these keys in the next `continue_workflow` call's `context` object:\n- `reviewFindings`\n- `criticalFindingsCount`\n- `majorFindingsCount`\n- `minorFindingsCount`\n- `nitFindingsCount`\n- `recommendation`\n- `recommendationConfidenceBand`\n- `recommendationDriftDetected`\n- `coverageLedger`\n- `coverageUncertainCount`\n- `docCompletenessConcernCount`\n\nUpdate `reviewDocPath` so the human artifact matches the canonical review state.",
           "requireConfirmation": false
         },
         {
@@ -213,7 +213,7 @@
           "Run final validation if any of these are true: `criticalSurfaceTouched = true`, `needsSimulation = true`, `falsePositiveRiskCount > 0`, `coverageUncertainCount > 0`, `docCompletenessConcernCount > 0`, or `recommendationConfidenceBand != High`.",
           "Mode-adaptive validation: QUICK = self-validate and optionally spawn ONE WorkRail Executor running `routine-hypothesis-challenge` if a serious uncertainty remains; STANDARD = if validation is required and delegation is available, spawn TWO WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge` and either `routine-execution-simulation` or `routine-plan-analysis`; THOROUGH = if validation is required and delegation is available, spawn THREE WorkRail Executors SIMULTANEOUSLY running `routine-hypothesis-challenge`, `routine-execution-simulation` when needed, and `routine-plan-analysis`.",
           "Compute `docCompletenessConcernCount` by counting one concern for each material packaging gap: missing rationale for any Critical or Major finding, missing ready-to-post MR comment for any Critical or Major finding, recommendation mismatch with canonical findings, still-uncertain / contradicted / needs-followup coverage domains not summarized clearly, or any missing required final section needed for actionability.",
-          "Set context variables: `validatorConsensusLevel`, `validationSummary`, `recommendationConfidenceBand`, `docCompletenessConcernCount`."
+          "Set these keys in the next `continue_workflow` call's `context` object: `validatorConsensusLevel`, `validationSummary`, `recommendationConfidenceBand`, `docCompletenessConcernCount`."
         ],
         "verify": [
           "If 2+ validators still raise serious concerns, confidence is downgraded and synthesis is reopened.",

package/workflows/routines/parallel-work-partitioning.json ADDED Viewed

@@ -0,0 +1,43 @@
+{
+  "id": "routine-parallel-work-partitioning",
+  "name": "Parallel Work Partitioning Routine",
+  "version": "0.1.0",
+  "description": "Analyzes a set of work items and their dependencies, partitions them into independent batches that can be safely parallelized across subagents, and produces a sorted execution plan. Flags items with cross-dependencies for sequential handling by the main agent.",
+  "metaGuidance": [
+    "INDEPENDENCE IS THE PRIORITY: a batch is only parallel-safe if no item in it depends on any other item in any concurrent batch.",
+    "CONSERVATIVE CLASSIFICATION: when dependency is unclear, classify as sequential. Wrong parallelization is worse than missed parallelization.",
+    "MAIN AGENT OWNS HARD WORK: items needing judgment, design decisions, or cross-cutting context belong to the main agent, not subagents."
+  ],
+  "steps": [
+    {
+      "id": "step-inventory",
+      "title": "Step 1: Inventory Work Items",
+      "prompt": "List every work item in scope.\n\nFor each item, capture:\n- Item identifier (file path, module name, test case, doc section, etc.)\n- One-line description of what needs to be done\n- Estimated complexity: trivial / moderate / complex\n- Whether it requires judgment or design decisions (yes/no)\n\nDon't classify or sort yet. Just get the full inventory.\n\nCapture:\n- `workItems` (the full list)\n- `totalCount`",
+      "requireConfirmation": false
+    },
+    {
+      "id": "step-dependency-map",
+      "title": "Step 2: Map Dependencies",
+      "prompt": "For each work item, identify its dependencies on other items in the inventory.\n\nA dependency exists when:\n- Item B imports, references, or extends something in Item A\n- Item B's output format or API is consumed by Item A\n- Item B cannot be correctly completed without Item A being done first\n- Item B shares mutable state, configuration, or resources with Item A\n\nA dependency does NOT exist just because:\n- Items are in the same module or directory\n- Items use the same library or framework\n- Items are conceptually related but structurally independent\n\nProduce a dependency map: for each item, list which other items it depends on (if any). Items with no dependencies on other in-scope items are roots.\n\nCapture:\n- `dependencyMap`\n- `rootItems` (items with no in-scope dependencies)\n- `crossCuttingItems` (items that many others depend on)",
+      "requireConfirmation": false
+    },
+    {
+      "id": "step-classify",
+      "title": "Step 3: Classify Parallel vs. Sequential",
+      "prompt": "Classify each work item into one of two tracks:\n\n**Parallel track** — can be delegated to a subagent. ALL of these must be true:\n- No unresolved dependencies on other in-scope items (or all dependencies are already completed/in an earlier batch)\n- Trivial or moderate complexity\n- Does NOT require judgment, design decisions, or cross-cutting context\n- Can be verified independently (build, test, or review in isolation)\n- Failure is cheap to fix (a bad result can be caught and redone)\n\n**Sequential track** — must be handled by the main agent. ANY of these makes it sequential:\n- Has dependencies on items that haven't been completed yet\n- Complex or requires design decisions\n- Touches cross-cutting concerns (shared state, configuration, APIs used by many items)\n- Needs context from other items to be done correctly\n- Failure is expensive (wrong output cascades to other items)\n\nWhen in doubt, classify as sequential.\n\nCapture:\n- `parallelItems`\n- `sequentialItems`\n- `classificationRationale` (one line per item explaining the decision)",
+      "requireConfirmation": false
+    },
+    {
+      "id": "step-batch",
+      "title": "Step 4: Form Parallel Batches",
+      "prompt": "Group parallel-track items into batches that can run concurrently.\n\nBatching rules:\n- No two items in the same batch should depend on each other\n- No two items in the same batch should write to the same file or resource\n- Keep batches between 3-5 items for manageable subagent scope\n- Items that share a common dependency should be in the same batch or in batches that run after that dependency is resolved\n- Prefer grouping items by similarity (same file type, same kind of work) so each subagent gets coherent instructions\n\nAlso sort sequential-track items by dependency order (dependencies first).\n\nCapture:\n- `parallelBatches` (array of batches, each with item list and subagent instructions summary)\n- `sequentialOrder` (sorted list of sequential items)\n- `executionPlan` (the full plan: which parallel batches run first, then which sequential items, noting any sync points)",
+      "requireConfirmation": false
+    },
+    {
+      "id": "step-plan-delivery",
+      "title": "Step 5: Produce Execution Plan",
+      "prompt": "Write the final execution plan as a structured artifact in {deliverableName}.\n\nStructure:\n\n```markdown\n# Parallel Work Partitioning Plan\n\n## Summary\n- Total items: [count]\n- Parallel track: [count] items in [N] batches\n- Sequential track: [count] items\n- Cross-cutting items: [list]\n\n## Execution Order\n\n### Wave 1: Parallel Batches (delegate to subagents)\n\n#### Batch 1: [theme/description]\n- [item]: [one-line task]\n- [item]: [one-line task]\n**Subagent instructions**: [what the subagent needs to know]\n\n#### Batch 2: ...\n\n### Wave 2: Sequential Items (main agent)\n1. [item]: [one-line task] — depends on: [items]\n2. [item]: [one-line task] — depends on: [items]\n\n## Dependency Graph\n[Compact representation of which items depend on which]\n\n## Risk Notes\n- [Any items where classification was uncertain]\n- [Any sync points where parallel work must complete before sequential work starts]\n```\n\nCapture:\n- `planDelivered`",
+      "requireConfirmation": false
+    }
+  ]
+}