npm - @exaudeus/workrail - Versions diffs - 3.52.0 → 3.54.0 - Mend

@exaudeus/workrail 3.52.0 → 3.54.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/dist/cli/commands/worktrain-overview.d.ts +1 -0
package/dist/cli/commands/worktrain-overview.js +64 -0
package/dist/cli-worktrain.js +20 -0
package/dist/console-ui/assets/{index-Ce7Feod7.js → index-CoXPahi0.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/daemon/daemon-env.d.ts +5 -0
package/dist/daemon/daemon-env.js +36 -0
package/dist/daemon/daemon-events.d.ts +11 -1
package/dist/manifest.json +29 -21
package/dist/trigger/github-queue-config.js +1 -1
package/dist/trigger/polling-scheduler.js +6 -7
package/dist/trigger/trigger-router.js +10 -0
package/dist/trigger/trigger-store.js +69 -2
package/dist/trigger/types.d.ts +4 -0
package/docs/design/dispatch-condition-and-adaptive-queue.md +97 -0
package/docs/design/dispatch-condition-implementation-plan.md +168 -0
package/docs/design/dispatch-condition-review-findings.md +61 -0
package/docs/ideas/backlog.md +163 -0
package/package.json +1 -1

package/docs/ideas/backlog.md CHANGED Viewed

@@ -6628,3 +6628,166 @@ The full Jira integration is a round-trip, not just a poll. Design the return pa
 **Kill switch:** `worktrain kill-sessions` -- aborts all running daemon sessions immediately. Useful when WorkTrain is doing something unexpected. Sends abort signal to all active sessions, marks them user-killed in the event log.
 **Commit signing:** verify `git commit` honors existing `commit.gpgsign` config, or add explicit opt-out for bot identities that don't have signing keys. Empirically verify before declaring this solved.
+---
+## triggers.yml hot-reload (Apr 20, 2026)
+The daemon reads `triggers.yml` once at startup. Any change requires a full daemon restart. This creates friction during trigger configuration iteration.
+**The fix:** watch `triggers.yml` for changes using `fs.watch()` or `chokidar`, re-validate the file on change, and if valid swap the in-memory trigger index without restarting the daemon. Active sessions in flight are unaffected (they hold their own trigger snapshot). New sessions after the reload use the new config.
+**Partial hot-reload is acceptable:** if the new `triggers.yml` fails validation, log a warning and keep the old config. Don't crash the daemon on a syntax error.
+**Implementation:** `TriggerRouter` already accepts a `TriggerIndex` at construction. The hot-reload path re-calls `loadTriggerStore()` and swaps the index reference on the router. `PollingScheduler` loops are keyed per trigger -- swapping the index would also require restarting the polling loops cleanly.
+**Priority:** Medium -- useful for onboarding and trigger iteration, not a production blocker.
+---
+## GitHub webhook trigger with assignee/event filtering (Apr 20, 2026)
+**The problem:** `github_queue_poll` has a 5-minute latency floor. Assigning an issue fires a GitHub webhook immediately -- WorkTrain should start within seconds, not minutes.
+### What exists today
+- `provider: generic` handles arbitrary POST webhooks with HMAC validation
+- `goalTemplate: "{{$.issue.title}}"` extracts issue title from payload
+- `hmacSecret: $MY_SECRET` validates `X-Hub-Signature-256`
+**You can use this today** but without an assignee filter -- any issue event fires the trigger regardless of who it's assigned to.
+### What's missing: assignee filtering
+A `contextCondition` or `dispatchCondition` field on the trigger that gates dispatch on a payload value:
+```yaml
+- id: self-improvement-hook
+  provider: generic
+  workflowId: coding-task-workflow-agentic
+  workspacePath: /path/to/repo
+  goalTemplate: "{{$.issue.title}}"
+  hmacSecret: $GITHUB_WEBHOOK_SECRET
+  dispatchCondition:
+    payloadPath: "$.assignee.login"
+    equals: "worktrain-etienneb"
+```
+Without this, the workaround is to create a dedicated webhook URL per-trigger so only the right events reach it (GitHub lets you filter by event type at the webhook level -- set it to "Issues" events only, which already narrows scope significantly).
+### The hook+poll pattern (recommended for production)
+```yaml
+# Primary: instant response via webhook
+- id: self-improvement-hook
+  provider: generic
+  goalTemplate: "{{$.issue.title}}"
+  hmacSecret: $GITHUB_WEBHOOK_SECRET
+  dispatchCondition:
+    payloadPath: "$.assignee.login"
+    equals: "worktrain-etienneb"
+# Fallback: catch anything missed during downtime
+- id: self-improvement-poll
+  provider: github_queue_poll
+  pollIntervalSeconds: 3600   # once per hour, safety net only
+```
+### Implementation
+1. Add `dispatchCondition: { payloadPath, equals }` to `TriggerDefinition` -- parsed in `trigger-store.ts`, checked in `trigger-router.ts` before enqueuing. Single condition is MVP; AND/OR logic is follow-up.
+2. Add `github_issues_webhook` as a named provider (wraps generic with GitHub-specific HMAC and event schema awareness). Convenience only -- generic already works.
+**Priority:** Medium-high. The 5-minute latency is the main UX gap once the queue is live. `dispatchCondition` is ~50 LOC in trigger-store + trigger-router.
+---
+## Demo repo feedback loop: WorkTrain improves itself via real task execution (Apr 20, 2026)
+### The idea
+Run WorkTrain against a real demo repo, observe what breaks, automatically file issues against the workrail repo, and have WorkTrain fix them. A self-improving feedback loop that surfaces real production failures faster than any manual testing.
+### Why this matters
+Right now WorkTrain's quality is validated by: the tasks we built it on (workrail itself) and manual inspection. That's a small, biased sample. A demo repo with diverse real tasks reveals failure modes in the full pipeline that workrail's self-improvement loop won't surface -- because the workrail tasks are always WorkTrain-flavored (TypeScript, same patterns, same tool use).
+### The loop
+```
+Demo repo tasks (worktrain:ready issues)
+  ↓
+WorkTrain runs full pipeline: discover → shape → code → PR → review → merge
+  ↓
+Failure classifier watches daemon event log
+  ↓
+For each failure: structured issue filed against workrail repo
+  (what task, what step, what went wrong, session ID, relevant log lines)
+  ↓
+worktrain-etienneb assigned → WorkTrain fixes itself
+  ↓
+WorkTrain re-runs the failed task → confirms fix
+```
+### What to build
+**Phase 1: Demo repo + manual observation**
+- Create or pick a demo repo -- real TypeScript project, diverse tasks (feature add, refactor, bug fix, test coverage, docs)
+- Add 5-10 `worktrain:ready` issues with acceptance criteria
+- Run WorkTrain on them, manually supervise first few runs
+- Collect failure patterns: what breaks, how often, at which pipeline step
+**Phase 2: Failure classifier**
+- Scheduled session (nightly cron trigger) that reads `~/.workrail/events/daemon/YYYY-MM-DD.jsonl`
+- Classifies sessions by outcome: success, error, timeout, stuck
+- For non-success sessions: extracts failure context (last tool call, stuck reason, step that failed, issue summaries from `report_issue`)
+- For each new failure: creates a GitHub issue against the workrail repo with:
+  ```
+  Title: [WorkTrain failure] <workflow> failed at <step>: <reason>
+  Body: Session: sess_xxx | Trigger: self-improvement | Task: #N "<title>"
+        Step: phase-3-plan-and-test-design
+        Failure: repeated_tool_call (grep on same pattern 3x)
+        Last tool args: {"pattern": "...", "path": "..."}
+        Issue summaries: ["Could not find X in codebase"]
+  Labels: worktrain:ready, bug
+  Assignee: worktrain-etienneb
+  ```
+- Deduplicates: doesn't file if an identical failure issue already exists and is open
+- ~100-150 LOC, new coordinator script `src/coordinators/failure-classifier.ts`
+**Phase 3: Auto-rerun after fix**
+- When WorkTrain merges a fix for a failure issue, the failure classifier re-queues the original demo task
+- Confirms the fix actually resolved the failure
+- Closes the failure issue if the task now succeeds
+### Demo repo criteria
+Good demo repo characteristics:
+- Real TypeScript project with actual functionality (not just stubs)
+- Has existing tests (so WorkTrain's changes can be verified)
+- Diverse task types: feature addition, refactor, bug fix, test coverage gap, documentation
+- Small enough that sessions complete within the 90-min timeout
+- Not workrail itself (avoids circular dependency in failure classification)
+Options:
+- A new personal project created specifically for this
+- An existing open source tool or library you maintain
+- A stripped-down clone of a Zillow service (no internal dependencies)
+### Demo repo tasks for first run (suggested)
+1. Add a new CLI flag with validation and tests
+2. Refactor a module to use a different pattern
+3. Fix a bug from a failing test
+4. Add test coverage for an uncovered function
+5. Write a README section documenting a feature
+These span the full task maturity spectrum and exercise different pipeline paths.
+### Relationship to benchmarking
+The same 10 demo tasks run after each WorkTrain release become a regression benchmark. Track: % completing successfully, # fix loop iterations needed, LLM turns per task, token cost per task. Plot over time. When the numbers improve, the release is better. When they regress, something broke.
+### Priority
+High -- this is the fastest path to data-driven self-improvement. Build Phase 1 first (pick a demo repo, run tasks manually), then Phase 2 (failure classifier) once you've seen 2-3 recurring failure patterns.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.52.0",
+  "version": "3.54.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {