npm - @exaudeus/workrail - Versions diffs - 3.80.0 → 3.82.0 - Mend

@exaudeus/workrail 3.80.0 → 3.82.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/dist/cli/commands/worktrain-diagnose.d.ts +132 -0
package/dist/cli/commands/worktrain-diagnose.js +822 -0
package/dist/cli-worktrain.js +97 -4
package/dist/console-ui/assets/{index-2NrQPYdF.js → index-DE4aB2eN.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/manifest.json +13 -5
package/docs/ideas/backlog.md +32 -1
package/package.json +1 -1

package/docs/ideas/backlog.md CHANGED Viewed

@@ -3418,6 +3418,26 @@ Open questions: does `wr.dispatch` replace `workflowId` in trigger config, or co
 ---
+### wr.mr-review quality and architecture overhaul (May 8, 2026)
+**Status: idea** | Priority: high
+**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
+The current `wr.mr-review` workflow produces findings that are often shallow, miss real issues present in the diff, and conflate pre-existing problems with changes introduced by the PR. In practice, reviews have missed incomplete migrations, attributed failures to the wrong root cause, and approved PRs with silent regressions in the commit history. The workflow runs as a single long-lived session reading the full diff in one pass, which limits how deeply any single concern can be investigated.
+The core gap: the review workflow does not spawn focused sub-agents to investigate suspicious areas. A reviewer that spots a potentially incomplete migration should be able to spawn a quick agent to grep the codebase for other sites that were not updated -- rather than noting it as a surface observation and moving on. Without targeted investigation, the review is pattern-matching on the diff rather than reasoning about system-wide impact.
+**Things to hash out:**
+- What should trigger spawning a sub-agent during review -- explicit workflow step, or reviewer judgment via `spawn_agent`?
+- Should sub-agents have narrow scope (e.g. "find all remaining `sessionId: string` in `src/daemon/`") or full workspace access?
+- How does the parent session synthesize sub-agent findings into the final verdict? What happens if a sub-agent returns inconclusive results?
+- What is the right decomposition of review concerns -- by file, by concern type, or by risk level?
+- How does the review workflow distinguish "pre-existing issue on main" from "regression introduced by this PR"?
+- What does a measurable acceptance criterion look like -- false negative rate, human reviewer agreement, or something else?
+---
 ### MR review session count inflation
 **Status: idea** | Priority: medium
@@ -4788,7 +4808,7 @@ The session DAG shows structure but not meaning. When watching a session run in
 ### Observability and logging as first-class citizens (Apr 17, 2026)
-**Status: idea** | Priority: high
+**Status: partial** -- `worktrain diagnose` shipped May 9, 2026 (PR #979). Deferred items tracked below.
 **Score: 11** | Cor:2 Cap:2 Eff:2 Lev:2 Con:3 | Blocked: no
@@ -4823,6 +4843,17 @@ worktrain logs --format json            # machine-readable output
 - Log rotation and retention -- how much disk space should logs consume, and who configures the retention policy?
 - "Silence = actively working" requires the agent loop to emit heartbeats. What is the heartbeat interval, and is this a new event type in the session store?
+**Delivered (May 9, 2026, PR #979):** `worktrain diagnose <sessionId>` -- scans last 7 days of daemon event logs, classifies sessions into CONFIG / WORKFLOW_STUCK / WORKFLOW_TIMEOUT / INFRA / ORPHANED / SUCCESS / DEFAULT, prints a failure card with evidence and suggested fix. `worktrain health <id>` now delegates to diagnose for prior-day sessions (previously returned "No events today"). `--json` and `--ascii` flags. Pure `parseDaemonEvents()` function with injected deps, 22 unit tests.
+**Still deferred:**
+- `worktrain failures` aggregate fleet view (which workflow/trigger fails most often)
+- `--since N` flag to widen the scan window beyond 7 days
+- `--verbose` flag for full step timeline (currently capped at 8 steps)
+- Conversation log `--deep` mode (full LLM turn text for stuck cases where argsSummary is truncated)
+- Push-based auto-write to outbox after each non-success session
+- Structured `failureCode` field in engine events (eliminate string-matching on `detail` field)
+- Console inline integration (show failure card per session in the UI)
 ---
 ### Event sourcing for orchestration: extend the session store to daemon and coordinator events (Apr 17, 2026)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.80.0",
+  "version": "3.82.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {