npm - @exaudeus/workrail - Versions diffs - 3.70.1 → 3.70.3 - Mend

@exaudeus/workrail 3.70.1 → 3.70.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/dist/console-ui/assets/{index-BcZJOyVG.js → index-Gmbzhc2B.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/daemon/daemon-events.d.ts +1 -1
package/dist/daemon/workflow-runner.js +4 -2
package/dist/manifest.json +15 -15
package/dist/trigger/polling-scheduler.d.ts +2 -1
package/dist/trigger/polling-scheduler.js +3 -2
package/dist/v2/durable-core/domain/prompt-renderer.js +18 -8
package/docs/discovery/design-review-findings.md +62 -65
package/docs/ideas/backlog.md +222 -106
package/docs/plans/workflow-modernization-design.md +177 -59
package/docs/tickets/next-up.md +7 -15
package/package.json +1 -1
package/workflows/adaptive-ticket-creation.json +53 -18
package/workflows/mr-review-workflow.agentic.v2.json +10 -4

package/workflows/mr-review-workflow.agentic.v2.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "id": "wr.mr-review",
   "name": "MR Review Workflow (Lean v2 \u2022 Notes-First \u2022 Evidence-Driven Reviewer Families)",
-  "version": "2.6.0",
+  "version": "2.7.0",
   "description": "Lean v2 MR review workflow. Merges intake, missing-input gating, context gathering, and re-triage into one structured front phase, then drives review through a shared fact packet, parallel reviewer families, contradiction-driven synthesis, and evidence-first final validation.",
   "about": "## MR Review Workflow\n\nThis workflow conducts a structured, evidence-driven code review of a merge request or pull request. It is designed for cases where you want a thorough, audit-quality review rather than a quick glance -- particularly when the change touches critical surfaces, spans many files, or carries real production risk.\n\n**What it does:**\nThe workflow locates and bounds the review target, enriches it with PR context and ticket intent, classifies the change by risk and shape, then runs parallel \"reviewer family\" agents (covering correctness, architecture, runtime risk, tests/docs, and more) from a shared neutral fact packet. It reconciles contradictions between reviewer families, stress-tests the recommendation with adversarial validators, and produces a final handoff with severity-classified findings and ready-to-post MR comments.\n\n**When to use it:**\n- Before merging a PR that touches auth, data models, APIs, or critical paths\n- When you want independent perspectives on a change without the noise of an unstructured review\n- When the change is large or the reviewer is unfamiliar with the surrounding code\n- When you need a reproducible audit trail for compliance or team review processes\n\n**What it produces:**\nA final review recommendation (approve / request changes / needs discussion) with a confidence band, severity-graded findings (Critical / Major / Minor / Nit), ready-to-post MR comments, a coverage ledger showing which review domains were checked, and an honest disclosure of any context that could not be recovered.\n\n**How to get good results:**\nProvide the PR URL, branch name, or diff. The workflow can recover most context on its own -- ticket links, repo patterns, policy docs -- but if the change has non-obvious intent, a one-sentence description of the goal helps calibrate review sensitivity. The workflow will not post comments or approve/reject without explicit instruction.",
   "examples": [
@@ -100,6 +100,12 @@
         ]
       }
     },
+    {
+      "id": "phase-0b-scope-and-completeness-gate",
+      "title": "Phase 0b: Scope & Completeness Gate",
+      "prompt": "Verify that the PR delivers what was asked and nothing more.\n\nThis step runs after context is established (Phase 0) and before forming a review hypothesis. Its output feeds the fact packet in Phase 2.\n\nStep 1 — Enumerate acceptance criteria:\nFrom the ticket/issue/PR description recovered in Phase 0, extract a flat list of acceptance criteria. If no explicit criteria exist, infer them from the stated goal and the PR title/description. Mark each as `explicit` (stated in ticket/issue) or `inferred` (derived from goal).\n\nIf no ticket, issue, or PR description is available, record `acceptanceCriteriaSource: none` and set `scopeCheckConfidence: Low`. Continue with downgraded confidence -- do not block the review.\n\nStep 2 — Check each criterion against the diff:\nFor each acceptance criterion, examine the diff and determine:\n- `met`: the diff clearly addresses this criterion\n- `partial`: the diff partially addresses it but something appears missing\n- `missing`: the diff does not appear to address this criterion at all\n- `unclear`: insufficient context to judge\n\nCite specific files or functions for `met` and `partial` judgments. Be concrete.\n\nStep 3 — Check for scope creep:\nLook for changes in the diff that go beyond what any acceptance criterion requires. Flag any change that:\n- modifies behavior not mentioned in the ticket/goal\n- touches files unrelated to the stated purpose\n- introduces new abstractions or refactors not required by the task\n\nDistinguish necessary implementation details (e.g. extracting a helper to implement the feature) from genuine scope creep (e.g. rewriting unrelated logic while here).\n\nStep 4 — Set context keys:\nSet these keys in the next `continue_workflow` call's `context` object:\n- `acceptanceCriteria`: array of `{ criterion, source: 'explicit'|'inferred', status: 'met'|'partial'|'missing'|'unclear', evidence? }`\n- `acceptanceCriteriaSource`: `'ticket'` | `'pr_description'` | `'inferred'` | `'none'`\n- `missingCriteriaCount`: number of criteria with status `missing` or `partial`\n- `scopeCreepFlags`: array of specific out-of-scope changes found (empty array if none)\n- `scopeCreepCount`: length of `scopeCreepFlags`\n- `scopeCheckConfidence`: `High` | `Medium` | `Low`\n\nRules:\n- do not block the review on unclear criteria -- record uncertainty and continue\n- a criterion is only `missing` if you can confirm the behavior is absent from the diff, not just absent from a single file\n- scope creep findings feed into the reviewer families as potential `patterns_architecture` or `philosophy_alignment` concerns -- do not duplicate them as standalone findings here",
+      "requireConfirmation": false
+    },
     {
       "id": "phase-1-state-hypothesis",
       "title": "Phase 1: State Review Hypothesis",
@@ -122,12 +128,12 @@
           "Keep `recommendationHypothesis` as a secondary hypothesis to challenge, not a frame to defend."
         ],
         "procedure": [
-          "Create a neutral `reviewFactPacket` containing: MR purpose and expected behavior change, review target and review-surface summary, changed files and module roots, key contracts / invariants / affected consumers, call-chain highlights, relevant repo patterns and exemplars, tests/docs expectations, discovered ticket/doc/policy context, accessible and missing context sources, and explicit open unknowns, relevant coding philosophy principles for this change (from CLAUDE.md, AGENTS.md, ~/.firebender/commands/philosophy.mdc, or soul file -- scope to the 3-5 most relevant for what was changed, not all principles), and existing patterns in the changed module (how similar problems are solved today in the same directory).",
+          "Create a neutral `reviewFactPacket` containing: MR purpose and expected behavior change, review target and review-surface summary, changed files and module roots, key contracts / invariants / affected consumers, call-chain highlights, relevant repo patterns and exemplars, tests/docs expectations, discovered ticket/doc/policy context, accessible and missing context sources, explicit open unknowns, relevant coding philosophy principles for this change (from CLAUDE.md, AGENTS.md, ~/.firebender/commands/philosophy.mdc, or soul file -- scope to the 3-5 most relevant for what was changed, not all principles), existing patterns in the changed module (how similar problems are solved today in the same directory), the acceptance criteria list and scope check results from Phase 0b (`acceptanceCriteria`, `missingCriteriaCount`, `scopeCreepFlags`).",
           "Initialize `coverageLedger` for these domains: `correctness_logic`, `contracts_invariants`, `patterns_architecture`, `philosophy_alignment`, `runtime_production_risk`, `tests_docs_rollout`, `security_performance`.",
           "Perform a preliminary self-review from the fact packet before choosing reviewer families.",
           "Reviewer family options: `correctness_invariants`, `patterns_architecture`, `philosophy_alignment`, `runtime_production_risk`, `test_docs_rollout`, `false_positive_skeptic`, `missed_issue_hunter`.",
           "Selection guidance: QUICK = no bundle by default unless ambiguity still feels material; STANDARD = 3 families by default; THOROUGH = 5 families by default.",
-          "Always include `correctness_invariants` unless clearly not applicable. Include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable. Include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`. Include `missed_issue_hunter` in THOROUGH. Include `false_positive_skeptic` when Major/Critical findings seem plausible or severity inflation risk is non-trivial. Include `philosophy_alignment` in STANDARD and THOROUGH when the change introduces new abstractions, modifies core patterns, or touches areas where the codebase philosophy is particularly relevant (error handling, type safety, DI boundaries, state management).",
+          "Always include `correctness_invariants` unless clearly not applicable. Include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable. Include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`. Include `missed_issue_hunter` in THOROUGH. Include `false_positive_skeptic` when Major/Critical findings seem plausible or severity inflation risk is non-trivial. Include `philosophy_alignment` in STANDARD and THOROUGH when the change introduces new abstractions, modifies core patterns, or touches areas where the codebase philosophy is particularly relevant (error handling, type safety, DI boundaries, state management), OR when `scopeCreepCount > 0`, OR when `missingCriteriaCount > 0`.",
           "Routing guidance: for `api_contract_change`, bias toward contract / consumer / backward-compatibility scrutiny; for `data_model_or_migration`, bias toward rollout / compatibility / simulation scrutiny; for `security_sensitive`, bias toward runtime-risk scrutiny and lower tolerance for weak evidence; for `test_only`, bias toward stronger false-positive suppression; for `mechanically_noisy_change`, bias toward stronger noise filtering and lower appetite for style-only findings.",
           "Set `coverageUncertainCount` as the number of coverage domains not yet safely closed: `uncertain` + `contradicted` + `needs_followup`.",
           "Initialize `contradictionCount`, `blindSpotCount`, and `falsePositiveRiskCount` to `0` if no reviewer-family bundle will run."
@@ -168,7 +174,7 @@
         "procedure": [
           "Before delegating, restate the current `recommendationHypothesis` and say which reviewer family is most likely to challenge it.",
           "Each reviewer family must return: key findings, severity estimates, confidence level, top risks, recommendation, and what others may have missed.",
-          "Family missions: `correctness_invariants` = logic, correctness, API and invariant risks; `patterns_architecture` = pattern fit, design consistency, architectural concerns; `runtime_production_risk` = runtime behavior, production impact, performance/state-flow risk; `test_docs_rollout` = test adequacy, docs, migration, rollout, affected consumers; `false_positive_skeptic` = challenge likely overreaches, weak evidence, or severity inflation; `missed_issue_hunter` = search for an important issue category the others may miss; `philosophy_alignment` = evaluate the implementation against the scoped principles from the fact packet -- name each violation by principle, explain how the code diverges, and distinguish real violations from stylistic preferences. Also ask: is this the right design approach, not just a correct one? Does it follow the established patterns in this module or introduce unnecessary divergence?",
+          "Family missions: `correctness_invariants` = logic, correctness, API and invariant risks; `patterns_architecture` = pattern fit, design consistency, architectural concerns; `runtime_production_risk` = runtime behavior, production impact, performance/state-flow risk; `test_docs_rollout` = test adequacy, docs, migration, rollout, affected consumers; `false_positive_skeptic` = challenge likely overreaches, weak evidence, or severity inflation; `missed_issue_hunter` = search for an important issue category the others may miss; `philosophy_alignment` = evaluate the implementation against the scoped principles from the fact packet. (1) For each relevant principle, state whether the implementation follows it, violates it, or is neutral -- name violations by principle and cite the specific code. Distinguish real violations from stylistic preferences. (2) Generate 2-3 alternative approaches that would also satisfy the acceptance criteria, each grounded in the same coding philosophy. For each alternative, state: what it would look like in one sentence, which philosophy principles it scores better or worse on, and why it was or was not the right choice here. (3) Render a verdict: is the chosen approach the best one given the philosophy, or merely a correct one? If a materially better alternative exists, flag it as a finding.",
           "Mode-adaptive parallelism: STANDARD = spawn THREE WorkRail Executors SIMULTANEOUSLY for the selected families; THOROUGH = spawn FIVE WorkRail Executors SIMULTANEOUSLY for the selected families.",
           "After receiving outputs, explicitly synthesize: what reviewer families confirmed, what was genuinely new, what appeared weak or overreached, and what changed your mind or did not.",
           "Set these keys in the next `continue_workflow` call's `context` object: `familyFindingsSummary`, `familyRecommendationSpread`, `contradictionCount`, `blindSpotCount`, `falsePositiveRiskCount`, `needsSimulation`.",