@exaudeus/workrail 3.52.0 → 3.54.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -6628,3 +6628,166 @@ The full Jira integration is a round-trip, not just a poll. Design the return pa
6628
6628
  **Kill switch:** `worktrain kill-sessions` -- aborts all running daemon sessions immediately. Useful when WorkTrain is doing something unexpected. Sends abort signal to all active sessions, marks them user-killed in the event log.
6629
6629
 
6630
6630
  **Commit signing:** verify `git commit` honors existing `commit.gpgsign` config, or add explicit opt-out for bot identities that don't have signing keys. Empirically verify before declaring this solved.
6631
+
6632
+ ---
6633
+
6634
+ ## triggers.yml hot-reload (Apr 20, 2026)
6635
+
6636
+ The daemon reads `triggers.yml` once at startup. Any change requires a full daemon restart. This creates friction during trigger configuration iteration.
6637
+
6638
+ **The fix:** watch `triggers.yml` for changes using `fs.watch()` or `chokidar`, re-validate the file on change, and if valid swap the in-memory trigger index without restarting the daemon. Active sessions in flight are unaffected (they hold their own trigger snapshot). New sessions after the reload use the new config.
6639
+
6640
+ **Partial hot-reload is acceptable:** if the new `triggers.yml` fails validation, log a warning and keep the old config. Don't crash the daemon on a syntax error.
6641
+
6642
+ **Implementation:** `TriggerRouter` already accepts a `TriggerIndex` at construction. The hot-reload path re-calls `loadTriggerStore()` and swaps the index reference on the router. `PollingScheduler` loops are keyed per trigger -- swapping the index would also require restarting the polling loops cleanly.
6643
+
6644
+ **Priority:** Medium -- useful for onboarding and trigger iteration, not a production blocker.
6645
+
6646
+ ---
6647
+
6648
+ ## GitHub webhook trigger with assignee/event filtering (Apr 20, 2026)
6649
+
6650
+ **The problem:** `github_queue_poll` has a 5-minute latency floor. Assigning an issue fires a GitHub webhook immediately -- WorkTrain should start within seconds, not minutes.
6651
+
6652
+ ### What exists today
6653
+
6654
+ - `provider: generic` handles arbitrary POST webhooks with HMAC validation
6655
+ - `goalTemplate: "{{$.issue.title}}"` extracts issue title from payload
6656
+ - `hmacSecret: $MY_SECRET` validates `X-Hub-Signature-256`
6657
+
6658
+ **You can use this today** but without an assignee filter -- any issue event fires the trigger regardless of who it's assigned to.
6659
+
6660
+ ### What's missing: assignee filtering
6661
+
6662
+ A `contextCondition` or `dispatchCondition` field on the trigger that gates dispatch on a payload value:
6663
+
6664
+ ```yaml
6665
+ - id: self-improvement-hook
6666
+ provider: generic
6667
+ workflowId: coding-task-workflow-agentic
6668
+ workspacePath: /path/to/repo
6669
+ goalTemplate: "{{$.issue.title}}"
6670
+ hmacSecret: $GITHUB_WEBHOOK_SECRET
6671
+ dispatchCondition:
6672
+ payloadPath: "$.assignee.login"
6673
+ equals: "worktrain-etienneb"
6674
+ ```
6675
+
6676
+ Without this, the workaround is to create a dedicated webhook URL per-trigger so only the right events reach it (GitHub lets you filter by event type at the webhook level -- set it to "Issues" events only, which already narrows scope significantly).
6677
+
6678
+ ### The hook+poll pattern (recommended for production)
6679
+
6680
+ ```yaml
6681
+ # Primary: instant response via webhook
6682
+ - id: self-improvement-hook
6683
+ provider: generic
6684
+ goalTemplate: "{{$.issue.title}}"
6685
+ hmacSecret: $GITHUB_WEBHOOK_SECRET
6686
+ dispatchCondition:
6687
+ payloadPath: "$.assignee.login"
6688
+ equals: "worktrain-etienneb"
6689
+
6690
+ # Fallback: catch anything missed during downtime
6691
+ - id: self-improvement-poll
6692
+ provider: github_queue_poll
6693
+ pollIntervalSeconds: 3600 # once per hour, safety net only
6694
+ ```
6695
+
6696
+ ### Implementation
6697
+
6698
+ 1. Add `dispatchCondition: { payloadPath, equals }` to `TriggerDefinition` -- parsed in `trigger-store.ts`, checked in `trigger-router.ts` before enqueuing. Single condition is MVP; AND/OR logic is follow-up.
6699
+ 2. Add `github_issues_webhook` as a named provider (wraps generic with GitHub-specific HMAC and event schema awareness). Convenience only -- generic already works.
6700
+
6701
+ **Priority:** Medium-high. The 5-minute latency is the main UX gap once the queue is live. `dispatchCondition` is ~50 LOC in trigger-store + trigger-router.
6702
+
6703
+ ---
6704
+
6705
+ ## Demo repo feedback loop: WorkTrain improves itself via real task execution (Apr 20, 2026)
6706
+
6707
+ ### The idea
6708
+
6709
+ Run WorkTrain against a real demo repo, observe what breaks, automatically file issues against the workrail repo, and have WorkTrain fix them. A self-improving feedback loop that surfaces real production failures faster than any manual testing.
6710
+
6711
+ ### Why this matters
6712
+
6713
+ Right now WorkTrain's quality is validated by: the tasks we built it on (workrail itself) and manual inspection. That's a small, biased sample. A demo repo with diverse real tasks reveals failure modes in the full pipeline that workrail's self-improvement loop won't surface -- because the workrail tasks are always WorkTrain-flavored (TypeScript, same patterns, same tool use).
6714
+
6715
+ ### The loop
6716
+
6717
+ ```
6718
+ Demo repo tasks (worktrain:ready issues)
6719
+
6720
+ WorkTrain runs full pipeline: discover → shape → code → PR → review → merge
6721
+
6722
+ Failure classifier watches daemon event log
6723
+
6724
+ For each failure: structured issue filed against workrail repo
6725
+ (what task, what step, what went wrong, session ID, relevant log lines)
6726
+
6727
+ worktrain-etienneb assigned → WorkTrain fixes itself
6728
+
6729
+ WorkTrain re-runs the failed task → confirms fix
6730
+ ```
6731
+
6732
+ ### What to build
6733
+
6734
+ **Phase 1: Demo repo + manual observation**
6735
+ - Create or pick a demo repo -- real TypeScript project, diverse tasks (feature add, refactor, bug fix, test coverage, docs)
6736
+ - Add 5-10 `worktrain:ready` issues with acceptance criteria
6737
+ - Run WorkTrain on them, manually supervise first few runs
6738
+ - Collect failure patterns: what breaks, how often, at which pipeline step
6739
+
6740
+ **Phase 2: Failure classifier**
6741
+ - Scheduled session (nightly cron trigger) that reads `~/.workrail/events/daemon/YYYY-MM-DD.jsonl`
6742
+ - Classifies sessions by outcome: success, error, timeout, stuck
6743
+ - For non-success sessions: extracts failure context (last tool call, stuck reason, step that failed, issue summaries from `report_issue`)
6744
+ - For each new failure: creates a GitHub issue against the workrail repo with:
6745
+ ```
6746
+ Title: [WorkTrain failure] <workflow> failed at <step>: <reason>
6747
+ Body: Session: sess_xxx | Trigger: self-improvement | Task: #N "<title>"
6748
+ Step: phase-3-plan-and-test-design
6749
+ Failure: repeated_tool_call (grep on same pattern 3x)
6750
+ Last tool args: {"pattern": "...", "path": "..."}
6751
+ Issue summaries: ["Could not find X in codebase"]
6752
+ Labels: worktrain:ready, bug
6753
+ Assignee: worktrain-etienneb
6754
+ ```
6755
+ - Deduplicates: doesn't file if an identical failure issue already exists and is open
6756
+ - ~100-150 LOC, new coordinator script `src/coordinators/failure-classifier.ts`
6757
+
6758
+ **Phase 3: Auto-rerun after fix**
6759
+ - When WorkTrain merges a fix for a failure issue, the failure classifier re-queues the original demo task
6760
+ - Confirms the fix actually resolved the failure
6761
+ - Closes the failure issue if the task now succeeds
6762
+
6763
+ ### Demo repo criteria
6764
+
6765
+ Good demo repo characteristics:
6766
+ - Real TypeScript project with actual functionality (not just stubs)
6767
+ - Has existing tests (so WorkTrain's changes can be verified)
6768
+ - Diverse task types: feature addition, refactor, bug fix, test coverage gap, documentation
6769
+ - Small enough that sessions complete within the 90-min timeout
6770
+ - Not workrail itself (avoids circular dependency in failure classification)
6771
+
6772
+ Options:
6773
+ - A new personal project created specifically for this
6774
+ - An existing open source tool or library you maintain
6775
+ - A stripped-down clone of a Zillow service (no internal dependencies)
6776
+
6777
+ ### Demo repo tasks for first run (suggested)
6778
+
6779
+ 1. Add a new CLI flag with validation and tests
6780
+ 2. Refactor a module to use a different pattern
6781
+ 3. Fix a bug from a failing test
6782
+ 4. Add test coverage for an uncovered function
6783
+ 5. Write a README section documenting a feature
6784
+
6785
+ These span the full task maturity spectrum and exercise different pipeline paths.
6786
+
6787
+ ### Relationship to benchmarking
6788
+
6789
+ The same 10 demo tasks run after each WorkTrain release become a regression benchmark. Track: % completing successfully, # fix loop iterations needed, LLM turns per task, token cost per task. Plot over time. When the numbers improve, the release is better. When they regress, something broke.
6790
+
6791
+ ### Priority
6792
+
6793
+ High -- this is the fastest path to data-driven self-improvement. Build Phase 1 first (pick a demo repo, run tasks manually), then Phase 2 (failure classifier) once you've seen 2-3 recurring failure patterns.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.52.0",
3
+ "version": "3.54.0",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {