@exaudeus/workrail 3.80.0 → 3.82.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli/commands/worktrain-diagnose.d.ts +132 -0
- package/dist/cli/commands/worktrain-diagnose.js +822 -0
- package/dist/cli-worktrain.js +97 -4
- package/dist/console-ui/assets/{index-2NrQPYdF.js → index-DE4aB2eN.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/manifest.json +13 -5
- package/docs/ideas/backlog.md +32 -1
- package/package.json +1 -1
package/docs/ideas/backlog.md
CHANGED
|
@@ -3418,6 +3418,26 @@ Open questions: does `wr.dispatch` replace `workflowId` in trigger config, or co
|
|
|
3418
3418
|
|
|
3419
3419
|
---
|
|
3420
3420
|
|
|
3421
|
+
### wr.mr-review quality and architecture overhaul (May 8, 2026)
|
|
3422
|
+
|
|
3423
|
+
**Status: idea** | Priority: high
|
|
3424
|
+
|
|
3425
|
+
**Score: 13** | Cor:3 Cap:3 Eff:2 Lev:3 Con:2 | Blocked: no
|
|
3426
|
+
|
|
3427
|
+
The current `wr.mr-review` workflow produces findings that are often shallow, miss real issues present in the diff, and conflate pre-existing problems with changes introduced by the PR. In practice, reviews have missed incomplete migrations, attributed failures to the wrong root cause, and approved PRs with silent regressions in the commit history. The workflow runs as a single long-lived session reading the full diff in one pass, which limits how deeply any single concern can be investigated.
|
|
3428
|
+
|
|
3429
|
+
The core gap: the review workflow does not spawn focused sub-agents to investigate suspicious areas. A reviewer that spots a potentially incomplete migration should be able to spawn a quick agent to grep the codebase for other sites that were not updated -- rather than noting it as a surface observation and moving on. Without targeted investigation, the review is pattern-matching on the diff rather than reasoning about system-wide impact.
|
|
3430
|
+
|
|
3431
|
+
**Things to hash out:**
|
|
3432
|
+
- What should trigger spawning a sub-agent during review -- explicit workflow step, or reviewer judgment via `spawn_agent`?
|
|
3433
|
+
- Should sub-agents have narrow scope (e.g. "find all remaining `sessionId: string` in `src/daemon/`") or full workspace access?
|
|
3434
|
+
- How does the parent session synthesize sub-agent findings into the final verdict? What happens if a sub-agent returns inconclusive results?
|
|
3435
|
+
- What is the right decomposition of review concerns -- by file, by concern type, or by risk level?
|
|
3436
|
+
- How does the review workflow distinguish "pre-existing issue on main" from "regression introduced by this PR"?
|
|
3437
|
+
- What does a measurable acceptance criterion look like -- false negative rate, human reviewer agreement, or something else?
|
|
3438
|
+
|
|
3439
|
+
---
|
|
3440
|
+
|
|
3421
3441
|
### MR review session count inflation
|
|
3422
3442
|
|
|
3423
3443
|
**Status: idea** | Priority: medium
|
|
@@ -4788,7 +4808,7 @@ The session DAG shows structure but not meaning. When watching a session run in
|
|
|
4788
4808
|
|
|
4789
4809
|
### Observability and logging as first-class citizens (Apr 17, 2026)
|
|
4790
4810
|
|
|
4791
|
-
**Status:
|
|
4811
|
+
**Status: partial** -- `worktrain diagnose` shipped May 9, 2026 (PR #979). Deferred items tracked below.
|
|
4792
4812
|
|
|
4793
4813
|
**Score: 11** | Cor:2 Cap:2 Eff:2 Lev:2 Con:3 | Blocked: no
|
|
4794
4814
|
|
|
@@ -4823,6 +4843,17 @@ worktrain logs --format json # machine-readable output
|
|
|
4823
4843
|
- Log rotation and retention -- how much disk space should logs consume, and who configures the retention policy?
|
|
4824
4844
|
- "Silence = actively working" requires the agent loop to emit heartbeats. What is the heartbeat interval, and is this a new event type in the session store?
|
|
4825
4845
|
|
|
4846
|
+
**Delivered (May 9, 2026, PR #979):** `worktrain diagnose <sessionId>` -- scans last 7 days of daemon event logs, classifies sessions into CONFIG / WORKFLOW_STUCK / WORKFLOW_TIMEOUT / INFRA / ORPHANED / SUCCESS / DEFAULT, prints a failure card with evidence and suggested fix. `worktrain health <id>` now delegates to diagnose for prior-day sessions (previously returned "No events today"). `--json` and `--ascii` flags. Pure `parseDaemonEvents()` function with injected deps, 22 unit tests.
|
|
4847
|
+
|
|
4848
|
+
**Still deferred:**
|
|
4849
|
+
- `worktrain failures` aggregate fleet view (which workflow/trigger fails most often)
|
|
4850
|
+
- `--since N` flag to widen the scan window beyond 7 days
|
|
4851
|
+
- `--verbose` flag for full step timeline (currently capped at 8 steps)
|
|
4852
|
+
- Conversation log `--deep` mode (full LLM turn text for stuck cases where argsSummary is truncated)
|
|
4853
|
+
- Push-based auto-write to outbox after each non-success session
|
|
4854
|
+
- Structured `failureCode` field in engine events (eliminate string-matching on `detail` field)
|
|
4855
|
+
- Console inline integration (show failure card per session in the UI)
|
|
4856
|
+
|
|
4826
4857
|
---
|
|
4827
4858
|
|
|
4828
4859
|
### Event sourcing for orchestration: extend the session store to daemon and coordinator events (Apr 17, 2026)
|