npm - @exaudeus/workrail - Versions diffs - 3.38.0 → 3.39.0 - Mend

@exaudeus/workrail 3.38.0 → 3.39.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/dist/cli-worktrain.js +207 -0
package/dist/console-ui/assets/{index-BtOJj6Xy.js → index-3oXZ_A9m.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/coordinators/pr-review.d.ts +57 -0
package/dist/coordinators/pr-review.js +520 -0
package/dist/manifest.json +15 -7
package/docs/discovery/coordinator-design-review.md +73 -0
package/docs/discovery/coordinator-script-design.md +96 -679
package/docs/discovery/hypothesis-challenge-report.md +44 -0
package/docs/discovery/simulation-report.md +85 -0
package/package.json +1 -1

package/docs/discovery/hypothesis-challenge-report.md ADDED Viewed

@@ -0,0 +1,44 @@
+# Hypothesis Challenge: PR Review Coordinator HTTP-First Design
+*Generated: 2026-04-18*
+## Target Claim
+The PR review coordinator's HTTP-first design (Candidate B) is sound: the 2-call HTTP notes extraction is reliable, the two-tier keyword parser correctly classifies review severity, and the 5 robustness rules are sufficient.
+## Strongest Counter-Argument
+`phase-6-final-handoff` has `requireConfirmation: true`. The preferredTipNodeId may point to a checkpoint/confirmation node whose recapMarkdown is sparse or absent, causing the keyword scanner to misclassify clean PRs as 'unknown' and escalate them unnecessarily.
+**Verdict:** Mitigated. In autonomous mode, the agent calls `continue_workflow` with substantive notes before the confirmation gate. WorkRail stores these notes as the RECAP payload for the node. The recapMarkdown IS populated for autonomous sessions completing phase-6. The requireConfirmation flag only blocks advancement until notes are written -- which the autonomous agent does.
+## Weak Assumptions / Evidence Gaps
+1. **`runs[0]` is always the most recent run:** True in practice. WorkRail appends new runs; index 0 is the most recent. Confirmed in worktrain-await.ts pollSession() which uses `runs[0]`.
+2. **Keyword scan is context-unaware:** 'blocking' appearing in negative context ('this is not blocking') would trigger a false positive. Mitigation: require BLOCKING keyword to be present without a preceding negation. Simpler: use priority order -- any blocking keyword -> blocking, regardless of context. The conservative default is acceptable.
+3. **go/no-go time check needs adaptation:** Rule 3 (don't spawn if remaining time < 20 minutes) was designed for daemon sessions with known maxSessionMinutes. A CLI coordinator has no such limit. Adaptation: track wall-clock time since coordinator start, refuse to spawn new sessions if elapsed > coordinator_max_minutes - 20.
+## Likely Failure Modes
+1. **recapMarkdown null for final step** -> 'unknown' severity -> escalate (conservative, correct)
+2. **Fix-agent loop max 3 passes exceeded** -> escalate after 3 (loop counter enforces this)
+3. **ECONNREFUSED on daemon calls** -> early exit with clear error message
+4. **Keyword false positive** -> PR escalated as blocking when actually clean (false negative on merge, acceptable)
+5. **Merge conflict at merge time** -> `gh pr merge` fails, coordinator reports error and escalates
+## Critical Tests
+- `parseFindingsFromNotes(null)` -> returns err, classifies as 'unknown'
+- `parseFindingsFromNotes(markdown with 'not blocking but...')` -> must NOT return 'blocking'
+- Loop counter: 3 passes with persistent minor -> escalate on pass 3, NOT pass 4
+- ECONNREFUSED: spawnSession failure propagates cleanly to stderr and exit code 1
+## Verdict: Keep
+The design is sound. The 2-call HTTP extraction works for autonomous sessions. The two-tier parser with conservative defaults is sufficient. The 5 robustness rules need one adaptation: Rule 3 (go/no-go time check) should use wall-clock time since coordinator start, not daemon session remaining time.
+## Next Action
+Proceed with Candidate B implementation. Add adaptation to Rule 3: track coordinator wall-clock start time; refuse new spawns if `now() - startTime > (coordinatorMaxMs - 20*60*1000)`. Default coordinatorMaxMs = 90 minutes.

package/docs/discovery/simulation-report.md ADDED Viewed

@@ -0,0 +1,85 @@
+# Execution Simulation Report: PR Review Coordinator Failure Paths
+*Generated: 2026-04-18*
+## Summary
+Three failure paths simulated for the PR review coordinator design. All three produce correct outcomes under the proposed design. One gap identified: Rule 3 (go/no-go time check) needs adaptation for CLI context.
+## Scenario 1: recapMarkdown is Null
+**Setup:** Session completes successfully, but `GET /api/v2/sessions/:id/nodes/:nodeId` returns `recapMarkdown: null`.
+**Trace:**
+```
+getAgentResult('handle-419')
+  GET /api/v2/sessions/handle-419 -> runs[0].preferredTipNodeId = 'node-xyz'
+  GET /api/v2/sessions/handle-419/nodes/node-xyz -> recapMarkdown = null
+  returns null
+parseFindingsFromNotes(null) -> err('notes is null or empty')
+classifySeverity(err) -> 'unknown'
+route('unknown') -> escalate
+```
+**Divergence from expected:** None -- conservative escalation is the designed behavior.
+**Outcome:** PR escalated, not merged. No crash. Clear escalation note written.
+## Scenario 2: Fix-Agent Loop Exhaustion (3 Passes, Persistent Minor)
+**Setup:** PR #406 has minor findings. Fix agent runs 3 times but each re-review still returns minor.
+**Trace:**
+```
+Pass 1: passCount=0 -> review: 'minor' -> passCount becomes 1 -> spawn fix agent -> re-review
+Pass 2: passCount=1 -> review: 'minor' -> passCount becomes 2 -> spawn fix agent -> re-review
+Pass 3: passCount=2 -> review: 'minor' -> passCount becomes 3 -> CHECK: 3 >= 3 -> STOP
+  -> escalate, write: 'PR #406: 3 fix passes exhausted, still minor. Manual review required.'
+```
+**Divergence from expected:** None -- loop terminates correctly at pass 3.
+**Key invariant verified:** `passCount >= MAX_FIX_PASSES` check happens BEFORE spawning fix agent, not after. This prevents a 4th spawn.
+**Outcome:** PR escalated after exactly 3 passes. Not merged.
+## Scenario 3: Daemon Not Running (ECONNREFUSED)
+**Setup:** No daemon running on port 3456. No lock files present.
+**Trace:**
+```
+discoverPort() -> no lock files -> falls back to 3456
+spawnSession('mr-review-workflow-agentic', 'Review PR #419...', '/workspace')
+  POST http://127.0.0.1:3456/api/v2/auto/dispatch
+  -> fetch throws Error: ECONNREFUSED 127.0.0.1:3456
+  -> spawnSession catches -> returns err('Could not connect to WorkTrain daemon on port 3456')
+runPrReviewCoordinator() receives err
+  -> deps.stderr('Could not connect to WorkTrain daemon on port 3456. Start with: worktrain daemon')
+  -> returns { kind: 'failure', exitCode: 1 }
+```
+**Divergence from expected:** None.
+**Outcome:** Clear error message, exit code 1, no hang, no partial state.
+## Divergence Analysis
+No divergences found. All 3 scenarios produce correct outcomes.
+## Gap Identified: Rule 3 Adaptation for CLI Context
+**Problem:** The original Rule 3 (go/no-go time check: don't spawn if remaining time < 20 minutes) was written for daemon sessions that have a `maxSessionMinutes` parameter. A CLI coordinator script has no such parameter.
+**Adaptation required:** Track wall-clock time since coordinator script start (`const startTimeMs = deps.now()` at beginning). Before spawning any new child session (review OR fix agent), check: `if (deps.now() - startTimeMs > coordinatorMaxMs - 20 * 60 * 1000)` -> refuse to spawn.
+**Default:** `coordinatorMaxMs = 90 * 60 * 1000` (90 minutes). Configurable via `--max-runtime` flag if needed.
+## Recommendations
+1. Implement `parseFindingsFromNotes(null)` path explicitly -- don't rely on empty string check.
+2. Keyword parser priority: BLOCKING/CRITICAL/REQUEST CHANGES takes absolute precedence over APPROVE/CLEAN/LGTM (blocking wins even if both present).
+3. Fix-agent loop: check `passCount >= MAX_FIX_PASSES` BEFORE spawning, not after.
+4. Add `coordinatorStartMs` tracking and Rule 3 go/no-go check adapted for CLI (wall-clock elapsed time).
+5. `gh pr merge` failures: catch, write to stderr, escalate -- do NOT retry automatically.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.38.0",
+  "version": "3.39.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {