npm - @exaudeus/workrail - Versions diffs - 3.42.0 → 3.44.0 - Mend

@exaudeus/workrail 3.42.0 → 3.44.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

package/dist/console-ui/assets/{index-DwfWMKvv.js → index-Bi38ITiQ.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/daemon/workflow-runner.d.ts +15 -1
package/dist/daemon/workflow-runner.js +86 -9
package/dist/manifest.json +39 -23
package/dist/trigger/adapters/github-queue-poller.d.ts +34 -0
package/dist/trigger/adapters/github-queue-poller.js +200 -0
package/dist/trigger/delivery-action.d.ts +2 -0
package/dist/trigger/delivery-action.js +24 -0
package/dist/trigger/github-queue-config.d.ts +18 -0
package/dist/trigger/github-queue-config.js +155 -0
package/dist/trigger/polling-scheduler.d.ts +1 -0
package/dist/trigger/polling-scheduler.js +185 -6
package/dist/trigger/trigger-router.js +24 -1
package/dist/trigger/trigger-store.js +77 -2
package/dist/trigger/types.d.ts +19 -0
package/docs/design/adaptive-coordinator-context-candidates.md +265 -0
package/docs/design/adaptive-coordinator-context-review.md +101 -0
package/docs/design/adaptive-coordinator-context.md +504 -0
package/docs/design/adaptive-coordinator-routing-candidates.md +340 -0
package/docs/design/adaptive-coordinator-routing-design-review.md +135 -0
package/docs/design/adaptive-coordinator-routing-review.md +156 -0
package/docs/design/adaptive-coordinator-routing.md +660 -0
package/docs/design/context-assembly-layer-design-review.md +110 -0
package/docs/design/context-assembly-layer.md +622 -0
package/docs/design/stuck-escalation-candidates.md +176 -0
package/docs/design/stuck-escalation-design-review.md +70 -0
package/docs/design/stuck-escalation.md +326 -0
package/docs/design/worktrain-task-queue-candidates.md +252 -0
package/docs/design/worktrain-task-queue-design-review.md +109 -0
package/docs/design/worktrain-task-queue.md +443 -0
package/docs/design/worktree-review-findings-candidates.md +101 -0
package/docs/design/worktree-review-findings-design-review.md +65 -0
package/docs/design/worktree-review-findings-implementation-plan.md +153 -0
package/docs/ideas/backlog.md +148 -0
package/package.json +3 -3

package/docs/design/worktree-review-findings-implementation-plan.md ADDED Viewed

@@ -0,0 +1,153 @@
+# Worktree Review Findings - Implementation Plan
+## Problem Statement
+PR #630 (`feat/worktree-auto-commit`) has 7 MR review findings (1 critical, 2 major, 4 minor) that must be resolved before merge. The critical bug causes delivery to fail with "not a git repository" because `runWorkflow()` deletes the worktree before `maybeRunDelivery()` runs.
+## Acceptance Criteria
+1. `runWorkflow()` does NOT remove the worktree on the success path or immediate-complete path.
+2. `makeSpawnAgentTool()` has a JSDoc comment documenting that child sessions always use `branchStrategy: 'none'`.
+3. `WorkflowRunSuccess` has a `readonly sessionId?: string` field.
+4. `runWorkflow()` sets `sessionId` in the success return when `branchStrategy === 'worktree'`.
+5. `trigger-router.ts` reads `result.sessionId` instead of `result.sessionWorkspacePath.split('/').at(-1)`.
+6. `trigger-store.ts` validates `branchPrefix` and `baseBranch` against `/^[a-zA-Z0-9._/-]+$/` and rejects values starting with `-`.
+7. `tests/unit/trigger-router.test.ts` has a test verifying delivery uses the worktree path.
+8. `npm run build` compiles clean.
+9. `npx vitest run` shows no regressions.
+10. `persistTokens()` is called unconditionally after worktree creation (not gated on `startContinueToken`).
+11. Immediate-complete path return includes `sessionWorkspacePath` and `sessionId` when `sessionWorktreePath !== undefined`.
+## Non-Goals
+- Do NOT touch `src/mcp/` in any way.
+- Do NOT change delivery logic in `delivery-action.ts`.
+- Do NOT change the cleanup location in `maybeRunDelivery()` (lines 365-377 in trigger-router.ts) -- this is correct.
+- Do NOT add new abstractions or dependencies.
+- Do NOT change workflow definitions or schema files.
+## Philosophy-Driven Constraints
+- Use `TriggerStoreError` with `kind: 'invalid_field_value'` for validation errors (errors-as-data).
+- `WorkflowRunSuccess.sessionId` must be `readonly` (immutability by default).
+- JSDoc must explain WHY, not just what (document 'why' principle).
+- Validation must happen at the boundary (trigger-store parse time), not at worktree creation time.
+- Architectural fix: cleanup moves to the correct layer, not patched at the symptom.
+## Invariants
+1. Worktree must exist until `maybeRunDelivery()` completes; `runWorkflow()` must NOT remove it on any success path.
+2. `persistTokens()` must always record `worktreePath` immediately after worktree creation (not conditional on token presence).
+3. The `sessionId` field on `WorkflowRunSuccess` must never require path parsing at the call site.
+4. `branchPrefix` and `baseBranch` must be validated before use (fail-fast at daemon startup).
+## Selected Approach
+Follow review verbatim, with one additional fix: the immediate-complete return path (line 3062) must also include `sessionWorkspacePath` and `sessionId` when a worktree was created (this was missing and discovered during design review).
+## Vertical Slices
+### Slice 1: CRITICAL -- Remove Premature Worktree Removal
+**File**: `src/daemon/workflow-runner.ts`
+**Changes**:
+- Remove the `if (sessionWorktreePath)` cleanup block at lines 3049-3058 (immediate-complete path).
+- Add `sessionWorkspacePath` and `sessionId` spread to the immediate-complete return at line 3062.
+- Remove the `// ---- Remove worktree on success ----` comment and `if (sessionWorktreePath)` block at lines 3502-3514 (success path).
+**Done when**: `runWorkflow()` returns without any `execFileAsync('git', ['-C', ..., 'worktree', 'remove', ...])` calls on the success path. The worktree cleanup comment in `trigger-router.ts` lines 355-357 remains the sole cleanup on the success path.
+### Slice 2: MAJOR -- JSDoc on makeSpawnAgentTool
+**File**: `src/daemon/workflow-runner.ts`
+**Changes**:
+- Add a JSDoc comment block immediately before `export function makeSpawnAgentTool(` (line 2009).
+- Content: "Child sessions spawned by this tool always have `branchStrategy: 'none'` -- they operate in the parent's workspace without their own worktree or feature branch. Coordinators that need isolated child sessions should dispatch them via `TriggerRouter.dispatch()` instead."
+**Done when**: JSDoc is present and describes the branchStrategy limitation.
+### Slice 3: Minor 1 -- Unconditional persistTokens After Worktree Creation
+**File**: `src/daemon/workflow-runner.ts`
+**Changes**:
+- Remove the `if (startContinueToken)` guard from the second `persistTokens()` call (lines 3020-3022).
+- Replace with an unconditional call: `await persistTokens(sessionId, startContinueToken ?? currentContinueToken, startCheckpointToken, sessionWorktreePath);`
+**Done when**: `persistTokens()` is called unconditionally after worktree creation, ensuring `worktreePath` is always written to the sidecar.
+### Slice 4: Minor 2 -- Thread sessionId Through WorkflowRunSuccess
+**Files**: `src/daemon/workflow-runner.ts`, `src/trigger/trigger-router.ts`
+**Changes in workflow-runner.ts**:
+- Add `readonly sessionId?: string` to `WorkflowRunSuccess` interface (after `sessionWorkspacePath`).
+- In the main success return (line 3526), add `...(sessionWorktreePath !== undefined ? { sessionId } : {})` (where `sessionId` is the process-local UUID already in scope).
+- In the immediate-complete return (line 3062), add `...(sessionWorktreePath !== undefined ? { sessionId } : {})` alongside `sessionWorkspacePath`.
+**Changes in trigger-router.ts**:
+- Line 321: Replace `result.sessionWorkspacePath.split('/').at(-1) ?? ''` with `result.sessionId ?? ''`.
+**Done when**: `WorkflowRunSuccess.sessionId` is set when `branchStrategy === 'worktree'` and trigger-router reads it directly without path manipulation.
+### Slice 5: Minor 3 -- Validate git-safe chars for branchPrefix/baseBranch
+**File**: `src/trigger/trigger-store.ts`
+**Changes**:
+- After lines 867-868 where `baseBranch` and `branchPrefix` are extracted, add regex validation.
+- For each non-undefined value, check `/^[a-zA-Z0-9._/-]+$/` and that it does not start with `-`.
+- Return `err({ kind: 'invalid_field_value', field: '...', triggerId: rawId })` on failure.
+**Done when**: A trigger with `branchPrefix: '--bad'` or `baseBranch: '-main'` fails at parse time with `kind: 'invalid_field_value'`.
+### Slice 6: Minor 4 -- Add End-to-End Delivery Test for branchStrategy:worktree
+**File**: `tests/unit/trigger-router.test.ts`
+**Changes**:
+- Add a test in the `describe('delivery wiring (autoCommit)')` block.
+- The test creates a `WorkflowRunSuccess` with `sessionWorkspacePath: '/worktrees/test-session-id'` and valid `lastStepNotes`.
+- Stubs `runWorkflowFn` to return this success result.
+- Verifies the first git call uses `/worktrees/test-session-id` as the working directory (not trigger.workspacePath).
+**Done when**: Test passes and verifies `execFn` is called with the worktree path.
+## Test Design
+### Existing Tests to Verify Unchanged
+- `tests/unit/trigger-router.test.ts` -- all existing tests must still pass.
+- `tests/unit/trigger-store.test.ts` -- all existing validation tests must still pass.
+### New Test (Slice 6)
+```
+describe('delivery wiring (autoCommit)')
+  it('uses sessionWorkspacePath as working directory when runWorkflow returns a worktree session')
+    - trigger: { autoCommit: true, branchStrategy: 'worktree', workspacePath: '/workspace' }
+    - runWorkflowFn returns: { _tag: 'success', sessionWorkspacePath: '/worktrees/abc-session', lastStepNotes: VALID_HANDOFF_NOTES }
+    - fakeExec: vi.fn().mockResolvedValue(...)
+    - assertion: fakeExec called; first git add call uses cwd '/worktrees/abc-session'
+```
+## Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| `startContinueToken` is undefined in practice when branchStrategy='worktree' | Very Low | Low | persistTokens writes '' as fallback; startup recovery handles it |
+| Removing cleanup breaks non-autoCommit worktree sessions | Low | Low | Startup recovery reaps after 24h; combination is unusual |
+| `sessionId` field name collision with WorkRail server sessionId | Low | Low | Field is optional; no ambiguity since it's typed on the interface |
+## PR Packaging Strategy
+All changes on existing branch `feat/worktree-auto-commit`. Single PR #630.
+Commit message: `fix(daemon): address worktree review findings -- move success cleanup, document spawn_agent limitation, thread sessionId, validate git-safe chars`
+## Philosophy Alignment
+| Principle | Slice | Status |
+|---|---|---|
+| Architectural fixes over patches | Slice 1 | Satisfied -- cleanup moved to correct layer |
+| Errors are data | Slice 5 | Satisfied -- TriggerStoreError returned |
+| Make illegal states unrepresentable | Slice 4 | Satisfied -- typed sessionId, no path-parsing |
+| Validate at boundaries | Slice 5 | Satisfied -- parse-time validation |
+| Document 'why' | Slice 2 | Satisfied -- JSDoc explains architectural reason |
+| Immutability by default | Slice 4 | Satisfied -- readonly field added |
+| YAGNI | All | Satisfied -- no new abstractions |
+## Open Questions
+None. All questions resolved during design.
+## Unresolved Unknown Count: 0
+## Plan Confidence Band: High

package/docs/ideas/backlog.md CHANGED Viewed

@@ -6247,3 +6247,151 @@ Scheduled tasks are the entry point for fully autonomous work:
 - `node-cron` or `croner` npm package for cron expression parsing and next-fire-time calculation. Lightweight, no daemon dependencies.
 - Scheduled triggers have no webhook payload -- `contextMapping` is empty, `goalTemplate` uses only static text or env vars.
 - The schedule state (last-fired-at per trigger) persists to `~/.workrail/schedule-state.json` so the daemon can detect missed runs on restart.
+---
+## Autonomous grooming loop + workOnAll mode (Apr 19, 2026)
+### The vision
+WorkTrain eventually finds and executes its own work without any human seeding the queue. This is the full autonomous loop: raw backlog idea → groomed issue → discovered/shaped spec → implemented PR → reviewed → merged. Zero human input required once configured.
+### Three autonomy levels
+**Level 0 -- Opt-in queue (current design)**
+Human adds `worktrain` label to specific issues. WorkTrain works those issues only. Safe, predictable, explicit.
+**Level 1 -- workOnAll mode**
+Config flag `workOnAll: true` in `~/.workrail/config.json`. WorkTrain looks at ALL open issues, infers which ones are actionable, picks the highest-priority one. Human escape hatch: `worktrain:skip` label blocks WorkTrain from touching a specific issue. Status labels (`worktrain:in-progress`, `worktrain:done`) are coordinator-managed for observability. No human-set maturity labels needed -- coordinator infers from content.
+**Level 2 -- Fully proactive**
+WorkTrain also surfaces work it found itself: failing CI, Dependabot alerts, backlog items with no issue, patterns in git history suggesting missing tests or docs. Creates its own work items, runs them, closes the loop.
+### The grooming loop (scheduled, e.g. nightly)
+Runs on a cron trigger. Responsibilities:
+1. Read `docs/ideas/backlog.md`, `docs/roadmap/now-next-later.md`, open GitHub issues
+2. Reconcile: close issues that are already done (PR merged), update priorities based on what shipped recently, flag duplicate or obsolete items
+3. For each ungroomed `worktrain` issue (or all issues in workOnAll mode): infer maturity -- does it have a linked spec? acceptance criteria? concrete implementation plan?
+4. For high-value `idea`-level items: autonomously run `wr.discovery` → `wr.shaping` → update or create issue with pitch attached, set `worktrain:specced`
+5. Backlog → issue promotion: when a backlog item crosses a readiness threshold (has enough context to act on), create a GitHub issue from it
+### Maturity inference (no human-set labels required in Level 1+)
+The coordinator reads issue content and infers:
+- Linked pitch/PRD/spec URL → `ready` or `specced`
+- Has acceptance criteria or concrete implementation plan → `specced` or `ready`
+- Vague/exploratory language → `idea`
+- Has open PR or recent branch activity → skip (already in flight)
+The `worktrain:idea/specced/ready` taxonomy is the coordinator's internal model, not something humans set. In Level 1+ the coordinator manages it automatically.
+### workOnAll config
+```json
+// ~/.workrail/config.json
+{
+  "workOnAll": true,
+  "workOnAllExclusions": ["needs-design", "blocked-external", "wontfix"],
+  "maxConcurrentSelf": 2
+}
+```
+`maxConcurrentSelf` caps how many autonomous self-improvement sessions run simultaneously -- important so WorkTrain doesn't try to implement 10 things at once and create merge conflicts.
+### Design notes
+- The grooming loop and the work loop are **separate triggers** with separate schedules. Grooming runs more frequently (nightly or post-merge). Work loop runs on demand or weekly.
+- The grooming loop requires LLM judgment ("is this ready?") -- it's a `wr.discovery`-style session on the backlog, not a deterministic script. This is a feature, not a limitation.
+- `worktrain:skip` is the only label humans need to set in Level 1+ -- it's the explicit "not this one" override.
+- Auto-PR-from-backlog requires careful scope: WorkTrain should create draft PRs for its own discoveries, not automatically push to open issues on other people's repos.
+### Priority
+This is the long-term autonomous vision. Implement in order:
+1. Level 0 (current, task queue PR #4)
+2. workOnAll config flag (small addition to the coordinator, after #4 ships)
+3. Maturity inference (replace label-based routing with content inference)
+4. Grooming loop (scheduled cron trigger, wr.discovery session on backlog)
+5. Level 2 proactive work (post-grooming, after proving the loop works)
+---
+## Escalating review gates based on finding severity (Apr 19, 2026)
+**The idea:** when an MR review returns a Critical finding post-implementation, the review is not over -- it triggers a deeper audit chain before merge is allowed.
+### Current state
+`worktrain run pr-review` routes by severity: `clean` → merge, `minor` → fix-agent loop, `blocking` → escalate to human. But "blocking" is binary -- a single Critical finding and a trivially incorrect comment are treated identically (both block, neither gets more scrutiny).
+### The right behavior
+After a fix round, if the re-review still returns a Critical finding (or the original review does):
+1. **Another full MR review** -- confirm the Critical is real, not a false positive from the reviewer
+2. **Production readiness audit** (`production-readiness-audit` workflow) -- a Critical finding often implies a runtime risk. Check for error handling gaps, security exposure, missing observability.
+3. **Architecture audit** (`architecture-scalability-audit`) -- if the Critical is architectural (wrong abstraction, tight coupling, violates invariants), run a targeted audit on the affected modules.
+Not all Criticals warrant all three. The coordinator should route based on the finding's `category` field (from `wr.review_verdict`):
+- `correctness` / `security` → always trigger prod audit
+- `architecture` / `design` → trigger arch audit
+- All → trigger re-review
+### Auto-merge policy interaction
+A PR that triggered the escalating audit chain should NEVER auto-merge, even if the final re-review comes back clean. The human should approve it explicitly after seeing the audit trail. This is a hard rule, not a setting.
+### Implementation notes
+- The escalation logic belongs in the `IMPLEMENT` and `REVIEW_ONLY` mode coordinators (part of the adaptive pipeline coordinator work).
+- `wr.review_verdict` `findings[].category` field needs to be defined if not already -- check `src/v2/durable-core/schemas/artifacts/review-verdict.ts`.
+- The audit chain runs sequentially (prod then arch), not in parallel -- each audit's output informs the next.
+- All audit session IDs should be linked to the same parent work unit so the console session tree shows the full chain.
+### Priority
+Design this alongside the adaptive pipeline coordinator (#3). The coordinator needs to know about this escalation policy before its routing logic is finalized -- the `IMPLEMENT` mode's post-review handling is incomplete without it.
+---
+## UX/UI impact detection and design workflow integration (Apr 19, 2026)
+**The idea:** When the adaptive pipeline coordinator classifies a task, it should detect whether the task touches user-facing surfaces (UI components, user flows, API contracts that clients consume) and automatically insert a `ui-ux-design-workflow` run before implementation.
+### Why this matters
+Coding tasks that touch UI get implemented without a design pass today. The agent writes functional code but often produces interfaces that are technically correct but experientially wrong -- wrong information hierarchy, wrong affordances, missing error states, missing loading states, wrong copy. A `ui-ux-design-workflow` run before coding forces the "multiple design directions before converging" discipline that prevents the single-solution trap.
+### Detection signals (what marks a task as UX-impactful)
+The coordinator should classify a task as `touchesUI: true` when any of:
+- Issue title or body mentions: component, screen, page, modal, dialog, button, form, flow, onboarding, dashboard, table, list, navigation, UX, UI, design, user-facing, frontend, console, web
+- Affected files (from git diff or knowledge graph) include: `console/src/`, `*.tsx`, `*.css`, `web/`, `views/`
+- The task has a `ui` or `frontend` label
+- The upstream spec (pitch/PRD) explicitly calls out visual or interaction design requirements
+False positives (running design workflow unnecessarily) are cheaper than false negatives (shipping bad UX). Default to `touchesUI: true` when signals are ambiguous and the task is `complexity: Medium` or larger.
+### Pipeline integration
+When `touchesUI: true`, the `IMPLEMENT` pipeline becomes:
+```
+coding-task-classify → ui-ux-design-workflow → coding-task-workflow-agentic → PR → review → merge
+```
+The `ui-ux-design-workflow` output (a design spec with chosen direction, information architecture, component breakdown, error states) feeds into Phase 0.5 of `coding-task-workflow-agentic` as the upstream spec. The coding agent then implements against a concrete design spec, not ad-hoc intuition.
+### Relationship to escalating review gates
+When a post-implementation MR review finds a UI/UX finding (wrong affordance, missing state, confusing flow), the escalation should include a targeted `ui-ux-design-workflow` audit pass, not just a code review. UX regressions need design eyes, not just code eyes.
+### Open design questions
+- **Who reviews the design spec before coding starts?** If the UX design workflow runs autonomously at 2am and coding starts immediately after, there is no human review of the design direction. This is fine for small UI tweaks; it's wrong for new user flows. The coordinator needs a complexity gate: `complexity: Large AND touchesUI: true` → require human ack on the design spec before coding.
+- **Design spec format:** `ui-ux-design-workflow` currently produces a markdown design document. Does the coding workflow reliably consume this as an upstream spec via Phase 0.5? Verify before relying on the automated handoff.
+- **Console-specific workflows:** WorkRail's console is a React/TypeScript SPA. Consider a `worktrain:console` label or file-path heuristic that routes to a console-specific design workflow variant.
+### Priority
+Design this as part of the adaptive coordinator (#3). The `touchesUI` flag belongs on the classification output alongside `taskComplexity` and `maturity`. The UI detection logic and the design workflow insertion are both coordinator-level concerns, not engine-level.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.42.0",
+  "version": "3.44.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {
@@ -54,8 +54,8 @@
     "preinstall": "node -e \"const v=parseInt(process.versions.node.split('.')[0],10); if(v<20){console.error('WorkRail requires Node.js >=20. Current: '+process.versions.node+'\\nPlease upgrade: https://nodejs.org/'); process.exit(1);}\"",
     "dev:mcp": "pkill -f \"$(pwd)/dist/mcp-server.js\" 2>/dev/null; sleep 0.5; WORKRAIL_TRANSPORT=http WORKRAIL_ENABLE_SESSION_TOOLS=true node dist/mcp-server.js",
     "dev:mcp:watch": "pkill -f \"$(pwd)/dist/mcp-server.js\" 2>/dev/null; sleep 0.5; WORKRAIL_TRANSPORT=http WORKRAIL_ENABLE_SESSION_TOOLS=true nodemon --watch dist --ext js --delay 2 --exec 'node dist/mcp-server.js'",
-    "web:dev": "npm run build && WORKRAIL_ENABLE_SESSION_TOOLS=true node dist/mcp-server.js",
-    "web:ci": "WORKRAIL_ENABLE_SESSION_TOOLS=true node dist/mcp-server.js",
+    "web:dev": "npm run build && node dist/cli-worktrain.js console",
+    "web:ci": "node dist/cli-worktrain.js console",
     "web:typecheck": "tsc -p tsconfig.web.json",
     "typecheck": "tsc --noEmit",
     "test": "vitest",