@exaudeus/workrail 3.70.1 → 3.70.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console-ui/assets/{index-BcZJOyVG.js → index-Gmbzhc2B.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/daemon/daemon-events.d.ts +1 -1
- package/dist/daemon/workflow-runner.js +4 -2
- package/dist/manifest.json +15 -15
- package/dist/trigger/polling-scheduler.d.ts +2 -1
- package/dist/trigger/polling-scheduler.js +3 -2
- package/dist/v2/durable-core/domain/prompt-renderer.js +18 -8
- package/docs/discovery/design-review-findings.md +62 -65
- package/docs/ideas/backlog.md +222 -106
- package/docs/plans/workflow-modernization-design.md +177 -59
- package/docs/tickets/next-up.md +7 -15
- package/package.json +1 -1
- package/workflows/adaptive-ticket-creation.json +53 -18
- package/workflows/mr-review-workflow.agentic.v2.json +10 -4
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"id": "wr.mr-review",
|
|
3
3
|
"name": "MR Review Workflow (Lean v2 \u2022 Notes-First \u2022 Evidence-Driven Reviewer Families)",
|
|
4
|
-
"version": "2.
|
|
4
|
+
"version": "2.7.0",
|
|
5
5
|
"description": "Lean v2 MR review workflow. Merges intake, missing-input gating, context gathering, and re-triage into one structured front phase, then drives review through a shared fact packet, parallel reviewer families, contradiction-driven synthesis, and evidence-first final validation.",
|
|
6
6
|
"about": "## MR Review Workflow\n\nThis workflow conducts a structured, evidence-driven code review of a merge request or pull request. It is designed for cases where you want a thorough, audit-quality review rather than a quick glance -- particularly when the change touches critical surfaces, spans many files, or carries real production risk.\n\n**What it does:**\nThe workflow locates and bounds the review target, enriches it with PR context and ticket intent, classifies the change by risk and shape, then runs parallel \"reviewer family\" agents (covering correctness, architecture, runtime risk, tests/docs, and more) from a shared neutral fact packet. It reconciles contradictions between reviewer families, stress-tests the recommendation with adversarial validators, and produces a final handoff with severity-classified findings and ready-to-post MR comments.\n\n**When to use it:**\n- Before merging a PR that touches auth, data models, APIs, or critical paths\n- When you want independent perspectives on a change without the noise of an unstructured review\n- When the change is large or the reviewer is unfamiliar with the surrounding code\n- When you need a reproducible audit trail for compliance or team review processes\n\n**What it produces:**\nA final review recommendation (approve / request changes / needs discussion) with a confidence band, severity-graded findings (Critical / Major / Minor / Nit), ready-to-post MR comments, a coverage ledger showing which review domains were checked, and an honest disclosure of any context that could not be recovered.\n\n**How to get good results:**\nProvide the PR URL, branch name, or diff. The workflow can recover most context on its own -- ticket links, repo patterns, policy docs -- but if the change has non-obvious intent, a one-sentence description of the goal helps calibrate review sensitivity. The workflow will not post comments or approve/reject without explicit instruction.",
|
|
7
7
|
"examples": [
|
|
@@ -100,6 +100,12 @@
|
|
|
100
100
|
]
|
|
101
101
|
}
|
|
102
102
|
},
|
|
103
|
+
{
|
|
104
|
+
"id": "phase-0b-scope-and-completeness-gate",
|
|
105
|
+
"title": "Phase 0b: Scope & Completeness Gate",
|
|
106
|
+
"prompt": "Verify that the PR delivers what was asked and nothing more.\n\nThis step runs after context is established (Phase 0) and before forming a review hypothesis. Its output feeds the fact packet in Phase 2.\n\nStep 1 — Enumerate acceptance criteria:\nFrom the ticket/issue/PR description recovered in Phase 0, extract a flat list of acceptance criteria. If no explicit criteria exist, infer them from the stated goal and the PR title/description. Mark each as `explicit` (stated in ticket/issue) or `inferred` (derived from goal).\n\nIf no ticket, issue, or PR description is available, record `acceptanceCriteriaSource: none` and set `scopeCheckConfidence: Low`. Continue with downgraded confidence -- do not block the review.\n\nStep 2 — Check each criterion against the diff:\nFor each acceptance criterion, examine the diff and determine:\n- `met`: the diff clearly addresses this criterion\n- `partial`: the diff partially addresses it but something appears missing\n- `missing`: the diff does not appear to address this criterion at all\n- `unclear`: insufficient context to judge\n\nCite specific files or functions for `met` and `partial` judgments. Be concrete.\n\nStep 3 — Check for scope creep:\nLook for changes in the diff that go beyond what any acceptance criterion requires. Flag any change that:\n- modifies behavior not mentioned in the ticket/goal\n- touches files unrelated to the stated purpose\n- introduces new abstractions or refactors not required by the task\n\nDistinguish necessary implementation details (e.g. extracting a helper to implement the feature) from genuine scope creep (e.g. rewriting unrelated logic while here).\n\nStep 4 — Set context keys:\nSet these keys in the next `continue_workflow` call's `context` object:\n- `acceptanceCriteria`: array of `{ criterion, source: 'explicit'|'inferred', status: 'met'|'partial'|'missing'|'unclear', evidence? }`\n- `acceptanceCriteriaSource`: `'ticket'` | `'pr_description'` | `'inferred'` | `'none'`\n- `missingCriteriaCount`: number of criteria with status `missing` or `partial`\n- `scopeCreepFlags`: array of specific out-of-scope changes found (empty array if none)\n- `scopeCreepCount`: length of `scopeCreepFlags`\n- `scopeCheckConfidence`: `High` | `Medium` | `Low`\n\nRules:\n- do not block the review on unclear criteria -- record uncertainty and continue\n- a criterion is only `missing` if you can confirm the behavior is absent from the diff, not just absent from a single file\n- scope creep findings feed into the reviewer families as potential `patterns_architecture` or `philosophy_alignment` concerns -- do not duplicate them as standalone findings here",
|
|
107
|
+
"requireConfirmation": false
|
|
108
|
+
},
|
|
103
109
|
{
|
|
104
110
|
"id": "phase-1-state-hypothesis",
|
|
105
111
|
"title": "Phase 1: State Review Hypothesis",
|
|
@@ -122,12 +128,12 @@
|
|
|
122
128
|
"Keep `recommendationHypothesis` as a secondary hypothesis to challenge, not a frame to defend."
|
|
123
129
|
],
|
|
124
130
|
"procedure": [
|
|
125
|
-
"Create a neutral `reviewFactPacket` containing: MR purpose and expected behavior change, review target and review-surface summary, changed files and module roots, key contracts / invariants / affected consumers, call-chain highlights, relevant repo patterns and exemplars, tests/docs expectations, discovered ticket/doc/policy context, accessible and missing context sources,
|
|
131
|
+
"Create a neutral `reviewFactPacket` containing: MR purpose and expected behavior change, review target and review-surface summary, changed files and module roots, key contracts / invariants / affected consumers, call-chain highlights, relevant repo patterns and exemplars, tests/docs expectations, discovered ticket/doc/policy context, accessible and missing context sources, explicit open unknowns, relevant coding philosophy principles for this change (from CLAUDE.md, AGENTS.md, ~/.firebender/commands/philosophy.mdc, or soul file -- scope to the 3-5 most relevant for what was changed, not all principles), existing patterns in the changed module (how similar problems are solved today in the same directory), the acceptance criteria list and scope check results from Phase 0b (`acceptanceCriteria`, `missingCriteriaCount`, `scopeCreepFlags`).",
|
|
126
132
|
"Initialize `coverageLedger` for these domains: `correctness_logic`, `contracts_invariants`, `patterns_architecture`, `philosophy_alignment`, `runtime_production_risk`, `tests_docs_rollout`, `security_performance`.",
|
|
127
133
|
"Perform a preliminary self-review from the fact packet before choosing reviewer families.",
|
|
128
134
|
"Reviewer family options: `correctness_invariants`, `patterns_architecture`, `philosophy_alignment`, `runtime_production_risk`, `test_docs_rollout`, `false_positive_skeptic`, `missed_issue_hunter`.",
|
|
129
135
|
"Selection guidance: QUICK = no bundle by default unless ambiguity still feels material; STANDARD = 3 families by default; THOROUGH = 5 families by default.",
|
|
130
|
-
"Always include `correctness_invariants` unless clearly not applicable. Include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable. Include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`. Include `missed_issue_hunter` in THOROUGH. Include `false_positive_skeptic` when Major/Critical findings seem plausible or severity inflation risk is non-trivial. Include `philosophy_alignment` in STANDARD and THOROUGH when the change introduces new abstractions, modifies core patterns, or touches areas where the codebase philosophy is particularly relevant (error handling, type safety, DI boundaries, state management)
|
|
136
|
+
"Always include `correctness_invariants` unless clearly not applicable. Include `test_docs_rollout` in STANDARD and THOROUGH unless clearly not applicable. Include `runtime_production_risk` when `criticalSurfaceTouched = true` or `needsSimulation = true`. Include `missed_issue_hunter` in THOROUGH. Include `false_positive_skeptic` when Major/Critical findings seem plausible or severity inflation risk is non-trivial. Include `philosophy_alignment` in STANDARD and THOROUGH when the change introduces new abstractions, modifies core patterns, or touches areas where the codebase philosophy is particularly relevant (error handling, type safety, DI boundaries, state management), OR when `scopeCreepCount > 0`, OR when `missingCriteriaCount > 0`.",
|
|
131
137
|
"Routing guidance: for `api_contract_change`, bias toward contract / consumer / backward-compatibility scrutiny; for `data_model_or_migration`, bias toward rollout / compatibility / simulation scrutiny; for `security_sensitive`, bias toward runtime-risk scrutiny and lower tolerance for weak evidence; for `test_only`, bias toward stronger false-positive suppression; for `mechanically_noisy_change`, bias toward stronger noise filtering and lower appetite for style-only findings.",
|
|
132
138
|
"Set `coverageUncertainCount` as the number of coverage domains not yet safely closed: `uncertain` + `contradicted` + `needs_followup`.",
|
|
133
139
|
"Initialize `contradictionCount`, `blindSpotCount`, and `falsePositiveRiskCount` to `0` if no reviewer-family bundle will run."
|
|
@@ -168,7 +174,7 @@
|
|
|
168
174
|
"procedure": [
|
|
169
175
|
"Before delegating, restate the current `recommendationHypothesis` and say which reviewer family is most likely to challenge it.",
|
|
170
176
|
"Each reviewer family must return: key findings, severity estimates, confidence level, top risks, recommendation, and what others may have missed.",
|
|
171
|
-
"Family missions: `correctness_invariants` = logic, correctness, API and invariant risks; `patterns_architecture` = pattern fit, design consistency, architectural concerns; `runtime_production_risk` = runtime behavior, production impact, performance/state-flow risk; `test_docs_rollout` = test adequacy, docs, migration, rollout, affected consumers; `false_positive_skeptic` = challenge likely overreaches, weak evidence, or severity inflation; `missed_issue_hunter` = search for an important issue category the others may miss; `philosophy_alignment` = evaluate the implementation against the scoped principles from the fact packet
|
|
177
|
+
"Family missions: `correctness_invariants` = logic, correctness, API and invariant risks; `patterns_architecture` = pattern fit, design consistency, architectural concerns; `runtime_production_risk` = runtime behavior, production impact, performance/state-flow risk; `test_docs_rollout` = test adequacy, docs, migration, rollout, affected consumers; `false_positive_skeptic` = challenge likely overreaches, weak evidence, or severity inflation; `missed_issue_hunter` = search for an important issue category the others may miss; `philosophy_alignment` = evaluate the implementation against the scoped principles from the fact packet. (1) For each relevant principle, state whether the implementation follows it, violates it, or is neutral -- name violations by principle and cite the specific code. Distinguish real violations from stylistic preferences. (2) Generate 2-3 alternative approaches that would also satisfy the acceptance criteria, each grounded in the same coding philosophy. For each alternative, state: what it would look like in one sentence, which philosophy principles it scores better or worse on, and why it was or was not the right choice here. (3) Render a verdict: is the chosen approach the best one given the philosophy, or merely a correct one? If a materially better alternative exists, flag it as a finding.",
|
|
172
178
|
"Mode-adaptive parallelism: STANDARD = spawn THREE WorkRail Executors SIMULTANEOUSLY for the selected families; THOROUGH = spawn FIVE WorkRail Executors SIMULTANEOUSLY for the selected families.",
|
|
173
179
|
"After receiving outputs, explicitly synthesize: what reviewer families confirmed, what was genuinely new, what appeared weak or overreached, and what changed your mind or did not.",
|
|
174
180
|
"Set these keys in the next `continue_workflow` call's `context` object: `familyFindingsSummary`, `familyRecommendationSpread`, `contradictionCount`, `blindSpotCount`, `falsePositiveRiskCount`, `needsSimulation`.",
|