npm - @exaudeus/workrail - Versions diffs - 3.39.0 → 3.41.0 - Mend

@exaudeus/workrail 3.39.0 → 3.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (97) hide show

package/dist/cli/commands/init.js +0 -3
package/dist/cli-worktrain.js +58 -26
package/dist/cli.js +0 -18
package/dist/config/app-config.d.ts +0 -16
package/dist/config/app-config.js +0 -14
package/dist/config/config-file.js +0 -3
package/dist/console-ui/assets/index-CQt4UhPB.js +28 -0
package/dist/console-ui/assets/index-DGj8EsFR.css +1 -0
package/dist/console-ui/index.html +2 -2
package/dist/coordinators/pr-review.d.ts +23 -1
package/dist/coordinators/pr-review.js +224 -5
package/dist/daemon/daemon-events.d.ts +9 -1
package/dist/daemon/soul-template.d.ts +2 -2
package/dist/daemon/soul-template.js +11 -1
package/dist/daemon/workflow-runner.d.ts +17 -3
package/dist/daemon/workflow-runner.js +401 -28
package/dist/di/container.js +1 -25
package/dist/di/tokens.d.ts +0 -3
package/dist/di/tokens.js +0 -3
package/dist/engine/engine-factory.js +0 -1
package/dist/infrastructure/console-defaults.d.ts +1 -0
package/dist/infrastructure/console-defaults.js +4 -0
package/dist/infrastructure/session/index.d.ts +0 -1
package/dist/infrastructure/session/index.js +1 -3
package/dist/manifest.json +124 -124
package/dist/mcp/handlers/session.d.ts +1 -0
package/dist/mcp/handlers/session.js +61 -13
package/dist/mcp/output-schemas.d.ts +10 -10
package/dist/mcp/server.js +1 -18
package/dist/mcp/tools.d.ts +12 -12
package/dist/mcp/transports/http-entry.js +0 -2
package/dist/mcp/transports/stdio-entry.js +1 -2
package/dist/mcp/types.d.ts +0 -2
package/dist/trigger/daemon-console.d.ts +2 -0
package/dist/trigger/daemon-console.js +1 -1
package/dist/trigger/trigger-listener.d.ts +2 -0
package/dist/trigger/trigger-listener.js +3 -1
package/dist/trigger/trigger-router.d.ts +4 -3
package/dist/trigger/trigger-router.js +13 -5
package/dist/trigger/trigger-store.js +17 -4
package/dist/types/workflow-source.d.ts +0 -1
package/dist/types/workflow-source.js +3 -6
package/dist/types/workflow.d.ts +1 -1
package/dist/types/workflow.js +1 -2
package/dist/v2/durable-core/domain/artifact-contract-validator.js +66 -0
package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.d.ts +25 -0
package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.js +31 -0
package/dist/v2/durable-core/schemas/artifacts/index.d.ts +3 -1
package/dist/v2/durable-core/schemas/artifacts/index.js +14 -1
package/dist/v2/durable-core/schemas/artifacts/review-verdict.d.ts +41 -0
package/dist/v2/durable-core/schemas/artifacts/review-verdict.js +30 -0
package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +236 -236
package/dist/v2/durable-core/schemas/session/events.d.ts +50 -50
package/dist/v2/durable-core/schemas/session/gaps.d.ts +2 -2
package/dist/v2/durable-core/schemas/session/manifest.d.ts +4 -4
package/dist/v2/durable-core/schemas/session/outputs.d.ts +8 -8
package/dist/v2/usecases/console-routes.d.ts +2 -1
package/dist/v2/usecases/console-routes.js +207 -5
package/dist/v2/usecases/console-service.js +14 -0
package/dist/v2/usecases/console-types.d.ts +1 -0
package/docs/authoring.md +16 -16
package/docs/design/coordinator-artifact-protocol-design-candidates.md +155 -0
package/docs/design/coordinator-artifact-protocol-design-review.md +103 -0
package/docs/design/coordinator-artifact-protocol-implementation-plan.md +259 -0
package/docs/design/coordinator-message-queue-drain-plan.md +241 -0
package/docs/design/coordinator-message-queue-drain-review.md +120 -0
package/docs/design/coordinator-message-queue-drain.md +289 -0
package/docs/design/shaping-workflow-external-research.md +119 -0
package/docs/discovery/late-bound-goals-impl-plan.md +147 -0
package/docs/discovery/late-bound-goals-review.md +82 -0
package/docs/discovery/late-bound-goals.md +118 -0
package/docs/discovery/steer-endpoint-design-candidates.md +288 -0
package/docs/discovery/steer-endpoint-design-review-findings.md +104 -0
package/docs/discovery/steer-endpoint-implementation-plan.md +284 -0
package/docs/ideas/backlog.md +447 -97
package/docs/ideas/design-candidates-console-session-tree-impl.md +64 -0
package/docs/ideas/design-candidates-session-tree-view.md +196 -0
package/docs/ideas/design-review-findings-console-session-tree-impl.md +75 -0
package/docs/ideas/design-review-findings-session-tree-view.md +88 -0
package/docs/ideas/implementation_plan_session_tree_view.md +238 -0
package/package.json +2 -1
package/spec/authoring-spec.json +16 -16
package/spec/shape.schema.json +178 -0
package/spec/workflow-tags.json +232 -47
package/workflows/coding-task-workflow-agentic.json +491 -480
package/workflows/mr-review-workflow.agentic.v2.json +5 -1
package/workflows/wr.shaping.json +182 -0
package/dist/console-ui/assets/index-3oXZ_A9m.js +0 -28
package/dist/console-ui/assets/index-8dh0Psu-.css +0 -1
package/dist/infrastructure/session/DashboardHeartbeat.d.ts +0 -8
package/dist/infrastructure/session/DashboardHeartbeat.js +0 -39
package/dist/infrastructure/session/DashboardLockRelease.d.ts +0 -2
package/dist/infrastructure/session/DashboardLockRelease.js +0 -29
package/dist/infrastructure/session/HttpServer.d.ts +0 -60
package/dist/infrastructure/session/HttpServer.js +0 -912
package/workflows/coding-task-workflow-agentic.lean.v2.json +0 -648
package/workflows/coding-task-workflow-agentic.v2.json +0 -324

package/docs/design/coordinator-message-queue-drain-review.md ADDED Viewed

@@ -0,0 +1,120 @@
+# Design Review Findings: Coordinator Message Queue Drain
+**Design reviewed:** Candidate B from `coordinator-message-queue-drain.md`
+(drainMessageQueue with cursor + text parsing)
+---
+## Tradeoff Review
+### T1: Stringly-typed dispatch (free-form text parsing)
+Accepted tradeoff. The `^\\s*stop\\b/i` anchor pattern is narrower than bare `stop` matching
+and covers realistic CLI usage. The risk of false-positive halt is real but diagnosable -- the
+outbox notification includes the triggering message text. Condition for no longer acceptable:
+automated tooling writing to the queue. Explicitly documented as a pivot trigger for Candidate C.
+### T2: New cursor file on disk
+Fully acceptable. Same format as `InboxCursor`; desync guard handles truncation; write failure
+is non-fatal. No new schema maintenance burden.
+### T3: Outbox notifications for all actionable messages
+Fully acceptable. Outbox write failure is non-fatal; stderr provides a backup diagnostic.
+Including notifications for all actions (not just `stop`) is the right call -- users need the
+feedback loop.
+---
+## Failure Mode Review
+| FM | Description | Mitigation | Residual risk |
+|---|---|---|---|
+| FM1 | `stop` fires on note message | `^\\s*stop\\b` anchor; outbox shows triggering text | Low -- diagnosable and recoverable |
+| FM2 | Cursor desync after queue wipe | Reset to 0 if cursor > totalLines | Low -- re-triggers past stop if present; outbox makes it visible |
+| FM3 | Duplicate add-pr | Set dedup before Stage 1 | None |
+| FM4 | Outbox write failure during stop | Non-fatal; stderr fallback | None -- stop still honored |
+| FM5 | ENOENT (no queue file) | Return empty DrainResult | None -- expected on fresh install |
+**Highest-risk failure mode:** FM1. Must include triggering message text and timestamp in the
+outbox notification and stderr log -- this is a required implementation detail, not optional.
+---
+## Runner-Up / Simpler Alternative Review
+**Candidate C strengths borrowed:** Structured parse result logged to stderr (`[INFO drain:kind=stop
+message=...]`) -- same diagnostic value as a `kind` field at zero schema cost.
+**Simpler variant (skip outbox notifications):** Rejected -- silent halt is a UX regression.
+**Simpler variant (skip `add-pr`):** Viable as a scope reduction. Included in this PR because the
+implementation cost is ~10 lines, and `skip-pr` without `add-pr` is asymmetric.
+---
+## Philosophy Alignment
+**Clearly satisfied:** Immutability, errors as values, DI, validate at boundaries, determinism,
+fakes over mocks, small pure functions, document WHY.
+**Under tension:**
+- "Explicit domain types over primitives" -- free-form text dispatch. Acceptable: pre-existing
+  schema constraint, documented as follow-up.
+- "Make illegal states unrepresentable" -- `DrainResult` can represent `stop: true` with
+  non-empty `skipPrNumbers`. Acceptable: `stop` check is first at call site; documented.
+---
+## Findings
+### YELLOW: `stop` regex false-positive on note messages
+The `^\\s*stop\\b/i` pattern is significantly better than bare `stop` matching, but it will still
+fire on a message like "stop and think about this before merging." No additional regex constraint
+is practical without excluding valid stop forms. The mitigation (outbox + stderr with triggering
+message text) is the correct and sufficient response.
+**Recommended revision:** None to the pattern itself. Ensure the outbox notification reads:
+`WorkTrain coordinator stopped by queued message: "[full message text]" (queued at [timestamp])`
+rather than a generic "coordinator stopped" message.
+### YELLOW: `DrainResult` allows `stop: true` + non-empty `skipPrNumbers`
+The call site must check `stop` before anything else. If a future maintainer adds code between
+the drain call and the `stop` check, or moves the check, the skip/add arrays could be acted on
+before the stop is honored.
+**Recommended revision:** Add a JSDoc invariant on `DrainResult`: "When `stop` is true, all
+other fields are informational only. The coordinator MUST honor `stop` before inspecting
+`skipPrNumbers` or `addPrNumbers`." Also add a comment at the call site.
+### YELLOW (minor): No structured parse log to stderr
+Without logging which pattern matched and for which message, diagnosing unexpected behavior
+requires reading the outbox. A one-line stderr log per actionable message helps during
+development and debugging.
+**Recommended revision:** For each actionable message (stop, skip-pr, add-pr), emit:
+`[INFO coord:drain kind=stop handle=... message="..." ts=...]` to `deps.stderr`.
+---
+## Recommended Revisions (summary)
+1. Outbox notification for `stop` must include the full triggering message text and timestamp.
+2. Add JSDoc invariant on `DrainResult` documenting that `stop: true` takes absolute precedence.
+3. Add a `[INFO coord:drain]` stderr log line for each actionable message (diagnostics).
+None of these revisions change the architecture. All are implementation-level details.
+---
+## Residual Concerns
+1. **Schema follow-up not filed yet.** A GitHub issue or backlog entry for adding a `kind`
+   field to `QueuedMessage` (Candidate C path) should be created as part of this PR.
+2. **No integration test.** Unit tests with fake deps are sufficient for the drain logic, but
+   an end-to-end test (write to real queue file, run coordinator, verify outbox) is not planned.
+   This is acceptable for a developer CLI tool.

package/docs/design/coordinator-message-queue-drain.md ADDED Viewed

@@ -0,0 +1,289 @@
+# Design Candidates: Coordinator Message Queue Drain
+**Task:** The PR review coordinator never reads `~/.workrail/message-queue.jsonl`, so
+messages queued via `worktrain tell` (from phone, terminal, or automation) are silently ignored.
+This document captures the design investigation for draining that queue inside the coordinator.
+---
+## Problem Understanding
+### Core tensions
+1. **Append-only invariant vs. consumed-message tracking.** The queue file must never be
+   truncated or rewritten -- the `worktrain-tell` command's documented invariant. But without
+   tracking which messages were processed, the coordinator re-processes the entire history on
+   every invocation. A cursor file (same pattern as `inbox-cursor.json`) resolves this cleanly
+   but adds a second file to manage.
+2. **Stringly-typed messages vs. explicit domain types.** `QueuedMessage.message` is free-form
+   text. The repo philosophy demands explicit domain types, but no `kind` field exists in the
+   current schema. Text parsing at the coordinator's read boundary is the only option within
+   the current schema -- it is not a patch, it is adapting to a pre-existing constraint.
+3. **Coordinator statefulness vs. single-pass design.** The coordinator is invoked once per run
+   today, not as a persistent loop. A cursor handles both cases correctly: repeat invocations
+   see only new messages; a one-time invocation drains everything queued since last run.
+4. **`stop` signal semantics vs. partial progress.** A `stop` in the queue must halt before any
+   spawn. But `stop` might appear alongside `skip-pr 42` in the same drain batch. `stop` takes
+   absolute precedence -- no partial processing, coordinator exits cleanly and writes an outbox
+   acknowledgment.
+### Likely seam
+The real seam is the top of `runPrReviewCoordinator()`, immediately before Stage 1 (PR discovery).
+This matches the backlog intent: "coordinator loop checks message-queue at the start of each cycle
+before spawning new agents." The coordinator is the right owner, not a shared utility, because
+message routing is coordinator-specific logic.
+### What makes this hard
+Not technically difficult. The risks are:
+- Forgetting to handle ENOENT (queue file doesn't exist yet = no messages, not a crash)
+- Cursor desync: if the queue is wiped, cursor > total lines; reset to 0 (same guard as `inbox-cursor.json`)
+- Text matching fragility: `stop` in "stop overthinking this" triggers coordinator halt
+---
+## Philosophy Constraints
+From `CLAUDE.md` and observed repo patterns:
+- **Errors are values, never thrown** -- `pr-review.ts` uses `Result<T, string>` throughout.
+  The drain result uses a plain `DrainResult` struct (stop is not an error, it is a valid outcome).
+- **All I/O injected via deps** -- new `drainMessageQueue()` must accept deps, not import `fs`.
+- **Immutability by default** -- all interface fields are `readonly`.
+- **Prefer fakes over mocks** -- tests use in-memory fake deps, no `vi.mock()`.
+- **Validate at boundaries, trust inside** -- malformed JSONL lines are skipped at the parse
+  boundary; core routing logic trusts parsed data.
+- **Document WHY, not WHAT** -- comments explain rationale, not mechanics.
+**Conflict:** "Explicit domain types over primitives" is under pressure from the free-form message
+text. The mitigation is narrow keyword patterns and clear documentation. This conflict is not
+resolved in this PR -- a `kind` field on `QueuedMessage` is the proper fix but changes the
+public CLI interface (out of scope here).
+---
+## Impact Surface
+Changes that must stay consistent if this design is implemented:
+- **`CoordinatorDeps` interface** in `src/coordinators/pr-review.ts`: gains `readFile` and
+  `appendFile`. These are additive -- no existing caller is broken.
+- **`cli-worktrain.ts` pr-review action**: must wire `readFile` and `appendFile` into the deps
+  object (two new lines in the composition root).
+- **`tests/unit/coordinator-pr-review.test.ts`**: every fake `CoordinatorDeps` object needs the
+  two new fields. Mechanical but must not be missed.
+- **`discoverConsolePort` deps** (mini-subset type): no change needed; it already has `readFile`.
+New files introduced on disk (runtime, not source):
+- `~/.workrail/message-queue-cursor.json` -- created on first coordinator run after this ships.
+---
+## Candidates
+### Candidate A -- Minimal: full-history drain, no cursor, timestamp filter
+**Summary:** On each coordinator run, read all messages in `message-queue.jsonl`, discard messages
+older than the coordinator's start time, act on the remainder.
+**Tensions resolved:** Simplest change; no new cursor file.
+**Tensions accepted:** Stale messages re-processed if clock skew or same-second invocations.
+A `stop` message from two days ago can halt a coordinator run today if the clock check is ambiguous.
+**Boundary:** Inline in `runPrReviewCoordinator()`, no new function or file.
+**Why this boundary is wrong:** The timestamp filter is not reliable enough. Same-second writes,
+NTP jumps, or leap-second events can cause a current `stop` to be discarded or a stale `stop` to
+fire. The cursor is strictly more correct.
+**Failure mode:** Stale `stop` from a previous session kills today's coordinator run. No recovery
+path -- the coordinator just exits. Users have to manually inspect the queue to understand why.
+**Repo-pattern relationship:** Departs -- `worktrain-inbox.ts` uses a cursor precisely to avoid
+the re-processing problem. This candidate ignores the established pattern.
+**Gains:** Zero new files.
+**Gives up:** Correctness. Behavior depends on queue history, not just current inputs -- violates
+"determinism over cleverness."
+**Scope judgment:** Too narrow -- solves the immediate symptom but breaks on any real usage.
+**Philosophy fit:** Conflicts with "determinism over cleverness." Does not honor "validate at
+boundaries" (stale messages leak through).
+**Verdict: Rejected.** Stale message re-processing is a correctness bug, not a tradeoff.
+---
+### Candidate B -- Best-fit: `drainMessageQueue()` with cursor, narrow text parsing
+**Summary:** Add a pure function `drainMessageQueue(deps, opts)` to `src/coordinators/pr-review.ts`.
+It reads new lines since `~/.workrail/message-queue-cursor.json`, parses message text for `stop` /
+`skip-pr N` / `add-pr N` using narrow regex patterns, writes outbox acknowledgments for actionable
+messages, advances the cursor. Called at the top of `runPrReviewCoordinator()` before Stage 1.
+**Tensions resolved:**
+- Append-only invariant respected (cursor tracks progress, queue file never modified)
+- Stale message re-processing eliminated by cursor
+- ENOENT handled (no queue = empty drain result = coordinator proceeds normally)
+- `stop` takes absolute precedence
+**Tensions accepted:**
+- Text parsing is not type-safe; fragile to natural language variation
+**Boundary solved at:** New exported function in `src/coordinators/pr-review.ts`.
+**Why this boundary is best-fit:** Message routing is coordinator-specific. The drain reads a
+coordinator-managed cursor file and writes outbox notifications -- both are coordinator
+responsibilities. Extracting to a shared utility would create coupling without benefit (no other
+coordinator exists today).
+**Key data structures:**
+```ts
+export interface DrainResult {
+  readonly stop: boolean;
+  readonly stopReason: string | null;
+  readonly skipPrNumbers: readonly number[];
+  readonly addPrNumbers: readonly number[];
+  readonly messagesProcessed: number;
+}
+```
+Cursor shape: `{ lastReadCount: number }` -- identical to `InboxCursor` in `worktrain-inbox.ts`.
+New `CoordinatorDeps` fields:
+```ts
+readonly readFile: (path: string) => Promise<string>;
+readonly appendFile: (path: string, content: string) => Promise<void>;
+```
+Parsing patterns:
+- stop: `/\bstop\b/i`
+- skip-pr: `/\bskip[- ]pr[\s#]+([0-9]+)/i`
+- add-pr: `/\badd[- ]pr[\s#]+([0-9]+)/i`
+**Failure mode:** A note message like "stop overthinking this" triggers coordinator halt. Mitigation:
+word-boundary requirement limits false positives; documented as known behavior with workaround
+("add-pr" or "note:" prefix for non-command messages).
+**Repo-pattern relationship:** Follows `worktrain-inbox.ts` cursor pattern exactly; follows
+`CoordinatorDeps` injection pattern exactly.
+**Gains:** Correct deduplication; clean separation; fully testable with fakes.
+**Gives up:** Type-safe dispatch. A `kind` field would be cleaner.
+**Impact surface:** `CoordinatorDeps` (additive), `cli-worktrain.ts` (2 new dep wires),
+`coordinator-pr-review.test.ts` (2 new fake dep fields).
+**Scope judgment:** Best-fit.
+**Philosophy fit:** Honors immutability (readonly result), DI for boundaries, errors as values,
+validate at boundaries. Partial conflict with "explicit domain types" (documented and accepted).
+**Verdict: Recommended.**
+---
+### Candidate C -- Broader: structured `kind` field on `QueuedMessage`
+**Summary:** Extend `QueuedMessage` with `readonly kind?: 'stop' | 'skip-pr' | 'add-pr' | 'note'`
+and `readonly payload?: Record<string, unknown>`. Update `worktrain-tell.ts` to accept `--kind`
+flag. Coordinator drains on `kind` field instead of text parsing.
+**Tensions resolved:** Eliminates the stringly-typed tension entirely. Discriminated union on
+`kind` makes routing exhaustive and type-safe.
+**Tensions accepted:** Schema change affects the public CLI interface. Existing `tell` invocations
+omitting `--kind` fall back to `kind: 'note'` (safe), but natural language commands no longer work
+(`worktrain tell "stop"` becomes a note, not a stop signal).
+**Boundary solved at:** `QueuedMessage` type in `worktrain-tell.ts` + coordinator drain in
+`pr-review.ts` + CLI parser in `cli-worktrain.ts`.
+**Why this boundary is too broad:** Adds `kind` to `QueuedMessage` -- a public interface change.
+The `tell` command is documented as accepting any free-form text. Adding a required semantic field
+is a separate design decision that should be preceded by discussion of the CLI UX.
+**Failure mode:** Users who currently type `worktrain tell "stop the agent"` find it ignored
+unless they learn to use `--kind stop`. The ergonomic regression is silent.
+**Repo-pattern relationship:** Honors "explicit domain types" and "make illegal states
+unrepresentable" from philosophy. Departs from current free-form-text CLI design.
+**Gains:** Type-safe dispatch, no regex fragility, forward-compatible for new action kinds.
+**Gives up:** Natural language ergonomics; requires more CLI plumbing.
+**Scope judgment:** Too broad for this task.
+**Philosophy fit:** Strongly honors explicit domain types, discriminated unions, exhaustiveness.
+Conflicts with YAGNI -- adds schema complexity before the feature is proven.
+**Verdict: Out of scope for this PR. File a follow-up issue.**
+---
+## Comparison and Recommendation
+| | A (timestamp) | B (cursor + text) | C (structured kind) |
+|---|---|---|---|
+| Stale message safety | Weak | Strong | Strong |
+| Schema change | No | No | Yes |
+| Scope fit | Too narrow | Best-fit | Too broad |
+| Testability | Full | Full | Full |
+| Text-parse fragility | Avoided (no parse) | Narrow regexes | Eliminated |
+| Repo-pattern alignment | Poor | Exact | Partial |
+| Philosophy fit | Weak | Good (with caveat) | Strong |
+**Recommendation: Candidate B.**
+Candidate A fails on correctness. Candidate C solves the right problem but changes the wrong
+boundary for this task. Candidate B is a direct adaptation of the existing `worktrain-inbox.ts`
+cursor pattern to the coordinator context -- it introduces no new architectural ideas, just
+applies the established approach.
+---
+## Self-Critique
+**Strongest argument against Candidate B:**
+The text-matching approach creates an implicit, undiscoverable API. Users sending messages from
+phones have no way to know that `stop` means stop but `halt` does not. There is no help text,
+no validation, no error message for unrecognized commands. This is a real UX problem.
+**What would tip the decision toward Candidate C:**
+Evidence that multiple clients (mobile app, automation scripts) need to send structured commands.
+At that point, the text-parsing approach becomes a reliability liability. The right test: if
+a second coordinator (e.g., a work-queue coordinator) also needs to consume the message queue,
+Candidate C's structured dispatch becomes clearly necessary.
+**Invalidating assumption:**
+Candidate B assumes the word-boundary `stop` regex is specific enough. If users commonly type
+messages like "stop worrying and trust the process" via phone, the stop regex will fire. Mitigation:
+require the stop keyword to appear as the first meaningful token in the message, or require a
+command prefix (e.g., `/stop`). This can be tightened without changing the architecture.
+---
+## Open Questions for the Main Agent
+1. Should the drain function write an outbox notification for every actionable message, or only
+   for `stop` (where the coordinator is halting and the user needs confirmation)? Suggested:
+   write for all actionable messages (stop, skip-pr, add-pr) to close the feedback loop.
+2. The `stop` signal exits cleanly -- should the coordinator report which messages caused the
+   stop in its final report? Suggested: yes, log the message text and timestamp in the run log.
+3. Should `add-pr` messages add new PRs to the list before or after deduplication? Suggested:
+   add them to `prs` before Stage 1 begins, guarding against duplicates with a Set.

package/docs/design/shaping-workflow-external-research.md ADDED Viewed

@@ -0,0 +1,119 @@
+# Shaping Workflow: External Research Synthesis
+# Date: Apr 18, 2026
+# Source: Deep research prompt answered by frontier model
+## TL;DR
+An 11-step prompt chain with two mandatory human gates, a self-refine loop with evaluator-optimizer split, sectioned solution divergence, and a hybrid JSON+markdown artifact. The single highest-leverage design decision: **generation and critique run on structurally different prompts (ideally different model families)** -- anchoring and self-preference bias are not mitigated by CoT or self-reflection alone (Lou & Sun 2025; Panickssery et al. 2024).
+## The 11-Step Skeleton
+| # | Step | Pattern | Output | Tokens |
+|---|---|---|---|---|
+| 1 | `ingest_and_extract` | Chain | Frame candidates, forces, open questions | 2–5k |
+| 2 | `frame_gate` | Interrupt | Confirmed problem + appetite | small | **MANDATORY HUMAN GATE** |
+| 3 | `diverge_solution_shapes` | Parallel ×4 | 4 candidate rough shapes | med ×4 |
+| 4 | `converge_pick` | Separate judge | Chosen shape + rationale | small-med |
+| 5 | `breadboard_and_elements` | Chain + 1 refine | Breadboard + fat-marker elements | 8–15k |
+| 6 | `rabbit_holes_nogos` | Adversarial | Risks, mitigations, no-gos, assumptions | 3–6k |
+| 7 | `context_pack_build` | Tool-augmented | File globs, utilities, conventions, related PRs | med-large |
+| 8 | `example_map_and_gherkin` | Chain | Rules, examples, Gherkin scenarios | 3–6k |
+| 9 | `draft_pitch` | Self-refine ×2, critic=separate prompt | Full pitch (markdown + JSON) | 8–15k ×critique |
+| 10 | `approval_gate` | Interrupt | Approved pitch | small | **MANDATORY HUMAN GATE** |
+| 11 | `finalize_and_handoff` | Deterministic + schema validate | Canonical artifact + pitch.md | <1k |
+Total budget: 50–200k tokens depending on divergence fan-out.
+## Key Empirical Findings
+### What actually mitigates LLM failure modes in shaping (ranked):
+1. **Generator ≠ Evaluator with authorship obfuscation** -- use different model families for generation vs critique. Beats anchoring, self-preference, and mode collapse simultaneously. CoT and self-reflection alone do NOT work (Lou & Sun 2025).
+2. **Verbalized Sampling + N-alternatives-before-selection** -- prompt for a distribution, not a single answer. 1.6–2.1× diversity gain (Zhang et al. arXiv 2510.01171).
+3. **Schema-constrained structured output** -- kills verbosity compensation, forces right abstraction level by construction.
+4. **ClarifyGPT-style consistency check** -- generate two independent interpretations; divergence triggers clarification.
+5. **Self-Refine with specific rubric**, bounded at 2–3 iterations (~20% absolute gain, Madaan et al. arXiv 2303.17651).
+6. **Red-team pass** with explicit "what's hallucinated / what's missing" prompts against a separate instance.
+### The right level of abstraction (encodable heuristic)
+**Interfaces and Invariants, Not Function Bodies.**
+Classify every sentence in the pitch as:
+- **(a) Interface** -- user-visible surfaces, data objects, integration points, touched modules
+- **(b) Invariant** -- declarative constraints (idempotency, auth model, consistency requirements, latency budgets)
+- **(c) Exclusion** -- explicitly excluded functionality
+- **(d) Implementation detail** -- over-specification, demote or cut
+- **(e) Vague** -- under-specification, replace with concrete interface/invariant or ask clarifying question
+A well-shaped pitch contains only (a), (b), (c).
+### Shaping for AI implementers vs humans (the key asymmetry)
+LLM implementers need:
+- **MORE explicit** than any human spec on: interfaces, invariants, conventions, no-gos, exact API versions, file boundaries (LLMs fabricate APIs, lack tacit codebase knowledge, lack scope-shame)
+- **LESS explicit** than junior-human spec on: standard implementation patterns (CRUD, routing, idiomatic error handling -- LLMs know these better)
+The dominant failure mode to design against: **confident architectural divergence** -- agent produces working, tested, reviewable PR that reinvents an existing utility or lands logic in the wrong layer. Looks plausible in review. Neither tests nor LLM sensors reliably catch it. Only a better spec prevents it.
+### Context Pack (Step 7) is the highest-leverage AI-specific addition
+But: **LLM-generated Context Packs are measurably inferior to human-curated ones** (ETH Zurich AGENTS.md study -- LLM-generated context reduced task success in 5 of 8 settings). Treat Step 7 output as a draft requiring spot-check.
+## The Artifact Schema
+```jsonc
+{
+  "shaping_run_id": "uuid",
+  "frame": {
+    "problem_story_md": "...",
+    "appetite": {
+      "calendar_weeks": 6,
+      "token_budget_est": 120000,
+      "agent_turns_est": 60,
+      "files_touched_est": 8,
+      "sizing_bucket": "small|medium|large"
+    },
+    "forces": { "push": [...], "pull": [...], "anxiety": [...], "habit": [...] }
+  },
+  "solution": {
+    "breadboard_md": "...",
+    "elements": [{ "name": "...", "description_md": "...", "classification": "interface|invariant|exclusion" }],
+    "alternatives_considered": [{ "sketch": "...", "rejected_because": "..." }]
+  },
+  "context_pack": {
+    "touch_globs": ["src/billing/**"],
+    "do_not_touch_globs": ["src/auth/**", "migrations/**"],
+    "reuse_utilities": [{ "path": "...", "symbol": "...", "signature": "...", "reason_to_reuse": "..." }],
+    "conventions_md": "...",
+    "related_prior_art": [{ "path_or_pr": "...", "relevance": "..." }]
+  },
+  "acceptance_criteria": {
+    "gherkin": "Feature: ...\n  Scenario: ...",
+    "verification_commands": ["pnpm test src/billing", "tsc --noEmit"],
+    "example_map": { "rules": [...], "examples": [...], "open_questions": [...] }
+  },
+  "rabbit_holes": [{ "risk": "...", "severity": "low|med|critical", "mitigation": "...", "patch_applied": true }],
+  "no_gos": ["..."],
+  "assumptions_log": [{ "step": "...", "assumption": "...", "confidence": 0.7, "rationale": "..." }],
+  "decomposition": {
+    "walking_skeleton": { "description": "thin end-to-end slice", "files": [...] },
+    "atomic_subtasks": [{ "id": "s1", "title": "...", "depends_on": [], "est_context_window": "single", "acceptance_scenario_refs": ["scenario-1"] }]
+  },
+  "pitch_md": "# Pitch: ...\n\n## Problem\n...",
+  "build_readiness_score": { "rubric_pass_count": 5, "critical_blockers": 0 }
+}
+```
+## What NOT to Build
+- Do NOT make this a dynamic autonomous agent -- shaping has a known skeleton (workflow, not agent)
+- Do NOT use tree-of-thoughts -- no cheap partial-goal verification signal in shaping
+- Do NOT build multi-agent role-plays -- single-voice judge with sectioning strictly dominates
+- Do NOT skip the frame gate on "small" tasks -- wrong frame on a small task still wastes the run
+## Failure Modes and Mitigations
+| Failure mode | Mitigation |
+|---|---|
+| Mode collapse on diverge step | Verbalized Sampling framing, explicit framing diversity, auto-retry at higher temperature if >70% overlap |
+| Self-preference on judge | Obfuscate authorship by rewriting all candidates into uniform voice; ideally different model family |
+| Verbosity compensation on pitch | Hard max-length on JSON fields; critic checks for vague modifiers without concrete nouns |
+| Hallucinated Context Pack entries | Tool-augment Step 7 with repo grep/AST scan; schema-validate all paths before Step 10 |
+| Over-decomposition | Minimum subtask size = single context window; maximum 8 subtasks per pitch; if more, appetite was wrong |
+| Silent architectural divergence | Include consistency-check sub-task: implementer lists every new file/symbol and justifies why it's not a duplicate |

package/docs/discovery/late-bound-goals-impl-plan.md ADDED Viewed

@@ -0,0 +1,147 @@
+# Implementation Plan: Late-Bound Goals
+**Feature**: Default `goalTemplate: "{{$.goal}}"` when no `goal` and no `goalTemplate` configured.
+---
+## Problem Statement
+Triggers require a static `goal` in `triggers.yml`. This makes dynamic-goal use cases (PR review, incident response) require either a static placeholder goal or a custom `goalTemplate`. The feature enables: `curl -X POST /webhook/my-trigger -d '{"goal": "review PR #42"}'` without any goal pre-configuration.
+---
+## Acceptance Criteria
+1. A trigger with neither `goal` nor `goalTemplate` in `triggers.yml` loads successfully (no `missing_field` error).
+2. When the webhook payload contains a `goal` field, the session is started with that value as the goal.
+3. When the webhook payload has no `goal` field, the session is started with `'Autonomous task'` as the goal, AND a warning is logged to daemon stderr.
+4. Existing triggers with a static `goal` field behave identically to before.
+5. TypeScript compiles without errors (no new `any` or `!` assertions on the new code path).
+---
+## Non-Goals
+- No change to `TriggerDefinition` type (`goal: string` stays required).
+- No change to `trigger-router.ts` interpolation logic.
+- No change to `console-routes.ts` dispatch endpoint.
+- No `goalSource` discriminant on `TriggerDefinition` (future enhancement).
+- No configurable fallback string other than `'Autonomous task'`.
+---
+## Philosophy-Driven Constraints
+- Validate at boundaries: injection must happen in `trigger-store.ts`, not in `trigger-router.ts`.
+- Make illegal states unrepresentable: `TriggerDefinition.goal` must always be a valid `string`.
+- YAGNI: minimal change -- named constant + ~8 lines + tests.
+- Document why: WHY comment required above the injection block.
+---
+## Invariants
+1. `TriggerDefinition.goal: string` -- never `undefined` or empty string after parse.
+2. `trigger.goal` is only used as a display/fallback value, never as a routing key.
+3. The interpolated goal (from payload or fallback) flows to the session, not the sentinel.
+4. Injection only fires when both `raw.goal` and `raw.goalTemplate` are absent.
+---
+## Selected Approach
+**Candidate A: Parse-time sentinel injection in `validateAndResolveTrigger()`**
+1. Remove `'goal'` from `requiredStringFields`.
+2. After the required-fields loop, add a block:
+   - If `raw.goal` absent AND `raw.goalTemplate` absent: set `resolvedGoal = LATE_BOUND_GOAL_SENTINEL` and `resolvedGoalTemplate = '{{$.goal}}'`. Log an INFO message.
+   - If `raw.goal` absent AND `raw.goalTemplate` present: set `resolvedGoal = LATE_BOUND_GOAL_SENTINEL`.
+   - Otherwise: `resolvedGoal = raw.goal!.trim()`.
+3. Replace `goal: raw.goal!.trim()` on line 974 with `goal: resolvedGoal`.
+4. If `resolvedGoalTemplate` was injected, pass it via the `goalTemplate` spread at line 980.
+**Runner-up**: Candidate B (optional type) -- rejected because it cascades null checks to 3+ files.
+**Rationale**: Exact match to the `concurrencyMode` default pattern at line 756. Zero downstream impact.
+---
+## Vertical Slices
+### Slice 1: Core implementation (~8 lines in trigger-store.ts)
+**File**: `src/trigger/trigger-store.ts`
+**Changes**:
+- Add `const LATE_BOUND_GOAL_SENTINEL = 'Autonomous task';` near the top of the validation section.
+- Remove `'goal'` from `requiredStringFields` array.
+- Add injection block with WHY comment + INFO log.
+- Declare `let resolvedGoal: string;` and `let resolvedGoalTemplate: string | undefined;` before the injection block.
+- Replace `goal: raw.goal!.trim()` with `goal: resolvedGoal` in the TriggerDefinition literal.
+- Pass `resolvedGoalTemplate` to the `goalTemplate` spread.
+**Done when**: TypeScript compiles. `loadTriggerConfig` returns a valid `TriggerDefinition` for a YAML with no `goal` and no `goalTemplate`.
+### Slice 2: Tests for trigger-store.ts
+**File**: `tests/unit/trigger-store.test.ts`
+**New test cases**:
+1. Trigger with no `goal` and no `goalTemplate` loads successfully, has `goal = 'Autonomous task'` and `goalTemplate = '{{$.goal}}'`.
+2. Trigger with no `goal` but an explicit `goalTemplate` loads successfully, has `goal = 'Autonomous task'` and the specified `goalTemplate`.
+3. Trigger with a static `goal` behaves identically to before (regression).
+**Done when**: All 3 tests pass.
+### Slice 3: Test for trigger-router.ts integration
+**File**: `tests/unit/trigger-router.test.ts`
+**New test case**:
+1. Route a webhook event with payload `{ goal: 'review PR #42' }` to a late-bound trigger. Verify `runWorkflow` is called with `goal = 'review PR #42'`.
+2. Route a webhook event with no `goal` field to a late-bound trigger. Verify `runWorkflow` is called with `goal = 'Autonomous task'`.
+**Done when**: Both tests pass.
+---
+## Test Design
+All tests use the existing `loadTriggerConfig` (pure function, no I/O) and `TriggerRouter` patterns from the test files. No mocks needed for Slice 2. Slice 3 uses `makeFakeRunWorkflow` pattern already in the test file.
+Run: `npx vitest run tests/unit/trigger-store.test.ts tests/unit/trigger-router.test.ts`
+---
+## Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| Non-null assertion on line 974 missed | Low | Compile error | TypeScript will catch |
+| Slice 3 test fails due to goal not threading through | Low | Test failure | Trace through route() logic |
+| Existing trigger-store tests broken by goal no longer being required | Low | Test failure | Regression test in Slice 2 covers this |
+---
+## PR Packaging
+Single PR: `feat/late-bound-goals`. All 3 slices in one commit. ~20-30 lines total (source + tests).
+---
+## Philosophy Alignment
+- validate-at-boundaries -> satisfied (injection in trigger-store.ts)
+- make-illegal-states-unrepresentable -> satisfied (goal: string always holds)
+- YAGNI -> satisfied (~8 source lines)
+- document-why -> satisfied (WHY comment in injection block)
+- prefer-explicit-domain-types -> tension (sentinel is stringly-typed; acceptable -- future `goalSource` discriminant tracked)
+---
+## Plan Confidence
+- unresolvedUnknownCount: 0
+- planConfidenceBand: High
+- estimatedPRCount: 1
+- followUpTickets: "Add goalSource discriminant to TriggerDefinition for console UI enhancement"