npm - @event4u/agent-config - Versions diffs - 1.14.0 → 1.15.0 - Mend

@event4u/agent-config 1.14.0 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (106) hide show

package/.agent-src/commands/agent-handoff.md +1 -1
package/.agent-src/commands/bug-fix.md +2 -2
package/.agent-src/commands/chat-history-checkpoint.md +2 -2
package/.agent-src/commands/chat-history-clear.md +1 -1
package/.agent-src/commands/chat-history-resume.md +2 -2
package/.agent-src/commands/chat-history.md +2 -2
package/.agent-src/commands/check-current-md.md +43 -32
package/.agent-src/commands/commit-in-chunks.md +43 -23
package/.agent-src/commands/compress.md +34 -2
package/.agent-src/commands/feature-roadmap.md +2 -2
package/.agent-src/commands/fix-portability.md +2 -2
package/.agent-src/commands/onboard.md +14 -5
package/.agent-src/commands/optimize-augmentignore.md +9 -0
package/.agent-src/commands/refine-ticket.md +9 -7
package/.agent-src/commands/review-changes.md +35 -8
package/.agent-src/commands/roadmap-create.md +13 -2
package/.agent-src/commands/roadmap-execute.md +9 -7
package/.agent-src/commands/set-cost-profile.md +8 -0
package/.agent-src/commands/sync-agent-settings.md +9 -0
package/.agent-src/commands/tests-execute.md +2 -3
package/.agent-src/rules/artifact-engagement-recording.md +1 -1
package/.agent-src/rules/augment-portability.md +56 -37
package/.agent-src/rules/chat-history-cadence.md +109 -0
package/.agent-src/rules/chat-history-ownership.md +123 -0
package/.agent-src/rules/chat-history-visibility.md +96 -0
package/.agent-src/rules/cli-output-handling.md +1 -1
package/.agent-src/rules/command-suggestion.md +3 -2
package/.agent-src/rules/commit-policy.md +44 -34
package/.agent-src/rules/direct-answers.md +1 -1
package/.agent-src/rules/language-and-tone.md +19 -15
package/.agent-src/rules/non-destructive-by-default.md +18 -18
package/.agent-src/rules/roadmap-progress-sync.md +133 -74
package/.agent-src/rules/role-mode-adherence.md +1 -1
package/.agent-src/rules/size-enforcement.md +2 -1
package/.agent-src/rules/user-interaction.md +28 -4
package/.agent-src/scripts/update_roadmap_progress.py +56 -4
package/.agent-src/skills/blade-ui/SKILL.md +29 -10
package/.agent-src/skills/command-writing/SKILL.md +15 -4
package/.agent-src/skills/existing-ui-audit/SKILL.md +24 -9
package/.agent-src/skills/fe-design/SKILL.md +20 -15
package/.agent-src/skills/file-editor/SKILL.md +9 -0
package/.agent-src/skills/livewire/SKILL.md +26 -7
package/.agent-src/skills/refine-ticket/SKILL.md +30 -24
package/.agent-src/skills/roadmap-management/SKILL.md +22 -16
package/.agent-src/skills/skill-writing/SKILL.md +3 -3
package/.agent-src/skills/upstream-contribute/SKILL.md +2 -2
package/.agent-src/templates/agent-settings.md +1 -1
package/.agent-src/templates/roadmaps.md +9 -8
package/.agent-src/templates/scripts/memory_lookup.py +1 -1
package/.agent-src/templates/scripts/work_engine/__init__.py +2 -2
package/.agent-src/templates/scripts/work_engine/cli.py +64 -461
package/.agent-src/templates/scripts/work_engine/cli_args.py +116 -0
package/.agent-src/templates/scripts/work_engine/delivery_state.py +3 -3
package/.agent-src/templates/scripts/work_engine/directives/backend/__init__.py +1 -1
package/.agent-src/templates/scripts/work_engine/directives/backend/implement.py +1 -1
package/.agent-src/templates/scripts/work_engine/directives/backend/memory.py +1 -1
package/.agent-src/templates/scripts/work_engine/directives/backend/plan.py +1 -1
package/.agent-src/templates/scripts/work_engine/directives/backend/report.py +1 -1
package/.agent-src/templates/scripts/work_engine/dispatcher.py +1 -1
package/.agent-src/templates/scripts/work_engine/emitters.py +43 -0
package/.agent-src/templates/scripts/work_engine/errors.py +19 -0
package/.agent-src/templates/scripts/work_engine/hook_bootstrap.py +76 -0
package/.agent-src/templates/scripts/work_engine/input_builders.py +163 -0
package/.agent-src/templates/scripts/work_engine/migration/v0_to_v1.py +34 -2
package/.agent-src/templates/scripts/work_engine/persona_policy.py +1 -1
package/.agent-src/templates/scripts/work_engine/resolvers/prompt.py +1 -1
package/.agent-src/templates/scripts/work_engine/state_io.py +202 -0
package/.claude-plugin/marketplace.json +1 -1
package/AGENTS.md +6 -4
package/CHANGELOG.md +83 -8
package/README.md +24 -23
package/docs/MIGRATION.md +122 -0
package/docs/architecture.md +83 -34
package/docs/contracts/STABILITY.md +95 -0
package/docs/contracts/adr-chat-history-split.md +132 -0
package/docs/contracts/adr-command-suggestion.md +146 -0
package/docs/contracts/adr-implement-ticket-runtime.md +122 -0
package/docs/contracts/adr-product-ui-track.md +384 -0
package/docs/contracts/adr-prompt-driven-execution.md +187 -0
package/docs/contracts/agent-memory-contract.md +149 -0
package/docs/contracts/artifact-engagement-flow.md +262 -0
package/docs/contracts/command-clusters.md +126 -0
package/docs/contracts/command-suggestion-flow.md +148 -0
package/docs/contracts/implement-ticket-flow.md +628 -0
package/docs/contracts/linear-ai-rules-inclusion.md +143 -0
package/docs/contracts/linear-ai-three-layers.md +131 -0
package/docs/contracts/rule-interactions.md +107 -0
package/docs/contracts/rule-interactions.yml +142 -0
package/docs/contracts/ui-stack-extension.md +236 -0
package/docs/contracts/ui-track-flow.md +338 -0
package/docs/getting-started.md +2 -2
package/docs/installation.md +42 -6
package/docs/migrations/commands-1.15.0.md +112 -0
package/docs/ui-track-mental-model.md +121 -0
package/package.json +1 -1
package/scripts/build_linear_digest.py +4 -4
package/scripts/check_portability.py +2 -0
package/scripts/check_public_links.py +185 -0
package/scripts/check_references.py +1 -0
package/scripts/lint_no_new_atomic_commands.py +179 -0
package/scripts/lint_rule_interactions.py +149 -0
package/scripts/memory_lookup.py +1 -1
package/scripts/release.py +297 -64
package/scripts/skill_linter.py +14 -0
package/scripts/update_counts.py +10 -0
package/.agent-src/rules/chat-history.md +0 -200

package/docs/contracts/STABILITY.md ADDED Viewed

@@ -0,0 +1,95 @@
+---
+stability: stable
+---
+# Stability policy for `docs/contracts/`
+This directory ships the **public contract surface** of `agent-config`.
+Every file here declares a stability level in YAML frontmatter:
+```yaml
+---
+stability: stable | beta | experimental
+---
+```
+The level dictates how the contract may be linked from the public
+surface (README, AGENTS.md, `docs/architecture.md`) and what kind of
+change to it requires what kind of release.
+## Levels
+### `stable`
+- **Breaking change** requires a **SemVer-major** bump (`X.0.0`).
+- README, AGENTS.md, and `docs/architecture.md` MAY link to it
+  without a marker.
+- Typical content: settled ADRs (decisions don't reopen without a
+  successor ADR); fully released contracts that have shipped through
+  one major release without breaking.
+### `beta`
+- **Breaking change** is allowed in a **minor-version** release
+  (`1.X.0`), provided the change appears in `CHANGELOG.md` under a
+  `### Breaking` heading.
+- README, AGENTS.md, and `docs/architecture.md` MAY link to it,
+  **provided** the link text or the surrounding sentence carries a
+  visible `(beta)` marker.
+- Typical content: flow contracts and recipes that are shipped and
+  load-bearing, but whose surface is expected to evolve before a
+  SemVer-major lock.
+### `experimental`
+- **Breaking change** is allowed in **any release** (including
+  patches), with a CHANGELOG note.
+- README, AGENTS.md, and `docs/architecture.md` MUST NOT link to
+  experimental contracts. Only the index inside `docs/contracts/`
+  may reference them.
+- Typical content: spike artefacts, runtime modules in pilot status,
+  early API drafts not yet wired into a roadmap-locked phase.
+## Frontmatter requirement
+Every `*.md` under `docs/contracts/` (except this `STABILITY.md` file
+itself) MUST start with a YAML frontmatter block declaring `stability:`.
+The link checker (`scripts/check_public_links.py`, P0.1b) reads this
+frontmatter and:
+- **fails CI** when README / AGENTS.md / `docs/architecture.md` links to
+  a contract marked `experimental`, to a missing target, or into
+  `agents/contexts/` (internal surface).
+- **warns** (non-fatal in default mode; fatal under `--strict`) when a
+  public-surface link to a `beta` contract has no `(beta)` marker in
+  the surrounding text.
+- ignores `stable` links.
+Run `task check-public-links` locally; `task ci` invokes the same
+checker in default mode.
+## Promotion path
+`experimental → beta → stable`. Demotion is allowed (e.g. `stable →
+beta` to permit a refactor) but appears in `CHANGELOG.md` under
+`### Breaking` and gets a SemVer-major bump.
+Promotion criteria:
+- `experimental → beta` — at least one shipped roadmap phase has
+  consumed the contract end-to-end without a breaking change.
+- `beta → stable` — at least one SemVer-minor release has shipped
+  with the contract unchanged, or the contract has been explicitly
+  frozen as part of a roadmap step.
+## Current contracts
+See the file headers themselves for current levels. The frontmatter is
+the authoritative source — this list is illustrative, not load-bearing,
+and is generated by `scripts/check_public_links.py --list`.
+## See also
+- [`agents/roadmaps/road-to-post-pr29-optimize.md`](../../agents/roadmaps/road-to-post-pr29-optimize.md) — P0.1, P0.1a, P0.1b
+- [`docs/architecture.md`](../architecture.md) — package architecture overview

package/docs/contracts/adr-chat-history-split.md ADDED Viewed

@@ -0,0 +1,132 @@
+---
+stability: beta
+---
+# ADR — Chat-history rule split
+> **Status:** Decided · 2026-05-02 · scoped under
+> [`road-to-post-pr29-optimize.md`](../../agents/roadmaps/archive/road-to-post-pr29-optimize.md)
+> Phase 2 item P2.1.
+> **Context:** AI #1, AI #3, AI #5 review of PR #29 flagged the
+> 200-line monolithic `rules/chat-history.md` as the rule the agent
+> revisited 12+ times during the 1.14.0 cycle — three independent
+> concerns coupled into one Iron Law block.
+> **Builds on:** [`adr-implement-ticket-runtime.md`](adr-implement-ticket-runtime.md)
+> — engine path was added to the existing rule mid-flight, which
+> exposed the coupling.
+> **Defers to:** Phase 2 P2.9 (memory-visibility surface) for the
+> reuse of the heartbeat plumbing.
+## Decision
+`rules/chat-history.md` is split into **three sibling `always` rules**,
+each owning one orthogonal concern and a clear single Iron Law:
+| Rule | Iron Law owned | What it answers |
+|---|---|---|
+| `chat-history-ownership.md` | Foreign/Returning/Missing handshake — whose file is this? | Activation, two-paths table (HOOK/ENGINE/CHECKPOINT/MANUAL), Foreign/Returning prompts |
+| `chat-history-cadence.md` | turn-check + append gates — when to write? | Turn-start gate, append cadence (`per_turn`/`per_phase`/`per_tool`), `OWNERSHIP_REFUSED` handling |
+| `chat-history-visibility.md` | Heartbeat marker — how does the user see drift? | Heartbeat invocation, `on`/`off`/`hybrid` modes, memory-typing-the-marker hard-rule |
+The original `rules/chat-history.md` is deleted. Cross-references
+across the three splits are explicit one-line backlinks at the top of
+each file. The CHECKPOINT-path Iron Law (the three-gate block) is
+re-anchored: ownership owns gate 1 (turn-check), cadence owns gate 2
+(append), visibility owns gate 3 (heartbeat).
+## Why this was a real question
+Three options were on the table:
+1. **Leave the monolith, add sub-section index.** Rejected: the
+   12-iteration churn was on individual sub-sections (heartbeat
+   visibility modes, foreign/returning prompts, cadence frequencies)
+   touching unrelated parts of the file. Index doesn't separate
+   concerns; it only renames the problem.
+2. **Move the Iron Law block to a context doc, leave the rule as a
+   thin pointer.** Rejected: contexts are reference, rules are
+   triggers — the three gates ARE the trigger surface. Demoting them
+   to context bleeds enforcement.
+3. **Three sibling rules, one concern each (chosen).** Adopted
+   because each concern has a distinct lifecycle: ownership is a
+   first-turn handshake (rare event, high stakes), cadence is per-
+   phase (high frequency, low stakes per write), visibility is per-
+   reply (every turn, observability-only). Coupling them produced the
+   "memory-typing the marker" failure mode we hardened against in
+   1.14.0 — separating them makes future hardening surgical.
+## Sizing & always-rule budget
+Pre-split: `chat-history.md` = 201 lines. Always-rule budget pre-split = 2207 lines across 18 rules.
+Post-split target (frontmatter + cloud-behavior + interactions
+sections triplicated):
+| File | Target lines |
+|---|---:|
+| `chat-history-ownership.md` | ≤ 95 |
+| `chat-history-cadence.md` | ≤ 80 |
+| `chat-history-visibility.md` | ≤ 65 |
+| **Total** | **≤ 240** |
+Net always-rule budget delta: **+39 lines** (≈ 1.8 % of the current
+2207-line total). Within the ~49k-token target ceiling tracked in
+`road-to-governance-cleanup.md`.
+## Consequences
+**Wins**
+- Each rule has one Iron Law instead of three. Future edits to
+  cadence don't risk touching ownership prompt mechanics.
+- The `chat-history-visibility.md` rule becomes a clean reuse target
+  for Phase 2 P2.9 (memory-visibility surface).
+- Refactoring the engine-side hooks (`chat_history_turn_check.py`,
+  `chat_history_append.py`, `chat_history_heartbeat.py`) is now
+  1:1 with rule files — one rule per hook.
+**Costs / migration surface**
+- Every cross-reference to `chat-history` across the package must be
+  updated to point at the right split. Inventory: ~30 files (rules,
+  templates, scripts, commands, contexts, README, AGENTS.md).
+- `docs-sync.md` table needs the new triple instead of the singleton.
+- `scripts/check_references.py` runs catch broken links; CI gates the
+  split.
+- Any consumer project carrying a project-level override of
+  `chat-history` must also split (or alias) — handled by an explicit
+  migration note in `docs/migrations/commands-1.15.0.md` (extended
+  for this rule split).
+**Reversibility**
+Split is a structural refactor, not a behavior change. Reverting
+would mean re-concatenating the three files — mechanical but loud
+in CI. The CHECKPOINT-path Iron Law numbering is preserved, so
+agents that learned "gates 1-2-3" still find the same numbered
+sequence, just across three files.
+## Out of scope
+- Behavior changes to `scripts/chat_history.py` — the script's
+  CLI surface is unchanged.
+- Adding new cadence frequencies or visibility modes — separate
+  decision, not part of this split.
+- Removing the CHECKPOINT-path entirely — Cline-on-Windows still
+  needs cooperative gates; that's a Phase 3+ conversation.
+## Implementation plan (this PR)
+1. Create the three new rules under `.agent-src.uncompressed/rules/`.
+2. Delete `.agent-src.uncompressed/rules/chat-history.md`.
+3. Run `bash scripts/compress.sh --sync` to project the change into
+   `.agent-src/`, `.augment/`, `.claude/`, `.cursor/`, `.clinerules/`,
+   `.windsurfrules`.
+4. Update every cross-reference (~30 files) — `check-refs` is the
+   gate.
+5. Append a migration note to `docs/migrations/commands-1.15.0.md`
+   under a new "Rule splits" section.
+6. Regenerate `agents/roadmaps-progress.md`.
+7. Verify: `task ci` (lint-skills, check-refs, check-public-links,
+   check-compression, counts-check, lint-readme, runtime-e2e,
+   roadmap-progress-check, lint-readme).

package/docs/contracts/adr-command-suggestion.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+stability: stable
+---
+# ADR — Context-Aware Command Suggestion
+> **Status:** Decided · Phases 1–7 shipped · 2026-04-30
+> **Roadmap:** [`road-to-context-aware-command-suggestion.md`](../../agents/roadmaps/road-to-context-aware-command-suggestion.md)
+> **Rule:** [`command-suggestion`](../../.agent-src.uncompressed/rules/command-suggestion.md)
+> **Eligibility table:** [`command-suggestion-eligibility.md`](command-suggestion-eligibility.md)
+> **Engine:** `scripts/command_suggester/`
+> **Orthogonal to:** R1 (`adr-work-engine-rename.md`) — the suggester is a
+> read-only shortcut *finder*; it never invokes a command. Engine boundaries
+> stay intact.
+## Decision
+Add a **deterministic, read-only** suggestion layer that scans every
+non-`/`-prefixed user turn against eligible command frontmatter
+(`suggestion.trigger_description` + `trigger_context`) and surfaces matches
+as a **numbered-options block** with the user-interaction Iron Law shape:
+- N command options, neutral labels (no inline `(recommended)` tag).
+- Always-last "Just run the prompt as-is, no command" escape hatch.
+- Exactly one `**Recommendation: K — /command**` line, omitted when the
+  top two scores are within `0.05` of each other.
+The suggester **never executes** anything — option-N is what triggers the
+standard slash-command flow on the next turn (with all per-command halts
+intact). The user picks every time.
+## Why this was a real question
+A maintainer who types "Setze Ticket ABC-123 um" got a generic
+implementation walk-through, not the `/implement-ticket` engine. The
+package owns 75 commands; the agent has the list in context but no
+contract telling it "when the prompt looks like *this*, offer *that*
+command". Three forces converged:
+1. **Discoverability gap.** Users learn `/command` only when documented;
+   most never type it.
+2. **Safety anchor.** Auto-execution is non-negotiably out (`scope-control`,
+   per-command halts). Anything we ship cannot weaken that floor.
+3. **Determinism.** ML-based intent detection is non-replayable; goldens
+   would drift. Pattern + keyword matching keeps GT-CS replays byte-stable.
+Three alternatives were rejected:
+- **Auto-route to commands** (bypass the user). Rejected: violates the
+  "user always picks" anchor; one wrong route inside `/implement-ticket`
+  is a multi-hour blast radius.
+- **Trigger-based command auto-prefix** (silently rewrite the prompt
+  to `/command <prompt>`). Rejected: same anchor breach, plus the user
+  loses the "skip" option.
+- **LLM intent classifier**. Rejected: non-deterministic, untestable in
+  the GT-CS harness, and the eligibility table is small enough that
+  pattern matching reaches the same precision at zero cost.
+## Eligibility rubric
+A command is **eligible** (default `true`) unless:
+- Invocation is **intentional-only** (`/onboard`, `/package-reset`,
+  `/mode`, `/agent-handoff`, `/chat-history-clear`).
+- Triggers would overlap so heavily with normal prose that the floor +
+  cooldown can't suppress the noise.
+- The command has no natural-language signal distinct from neighbours.
+The locked table is
+[`command-suggestion-eligibility.md`](command-suggestion-eligibility.md).
+Eligibility changes are roadmap follow-ups, not drive-by edits.
+## Anti-noise heuristics
+`rank.py` drops matches that fail any of:
+- Score below `confidence_floor` (default `0.6`; per-command override
+  via frontmatter).
+- Single match within `floor + 0.1` of the floor and no structural
+  bonus (high uncertainty isn't worth interrupting).
+- Short prompt (< 6 words) hitting > 2 commands without a structural
+  bonus (too vague to disambiguate).
+- Pure continuation phrase (`ok`, `weiter`, `mach weiter`, `go on`,
+  `continue`) with no structural bonus.
+- Same `(command, evidence)` pair shown within `cooldown_seconds`
+  (default `600`, per-command override).
+Structural bonuses (ticket key, file path, branch shape) override
+suppression when they signal real intent.
+## Hardening — what suggestion must never do
+The rule (`.agent-src.uncompressed/rules/command-suggestion.md`)
+binds the engine to five non-negotiables, mirrored as goldens:
+1. **No execution without user pick.** `SUGGEST. NEVER INVOKE.` is the
+   first Iron Law in the rule.
+2. **No multi-question stacks.** Clarification (`ask-when-uncertain`)
+   wins; suggestion stays silent that turn.
+3. **No conversation hijack.** `enabled: false`, no matches above floor,
+   or a senior rule active → zero output.
+4. **No echo-trigger.** `sanitize.py` strips fenced/inline code and the
+   engine's own previous block shape (`> N. /command — …`,
+   `**Recommendation: N — …**`) before scoring.
+5. **Hard subordination.** `scope-control`, `ask-when-uncertain`,
+   `verify-before-complete`, and any active role-mode contract outrank
+   suggestion.
+GT-CS1 through GT-CS9 (`tests/test_command_suggester_goldens.py`) lock
+the contract end-to-end.
+## Three opt-out paths
+| Path | Mechanism | Scope |
+|---|---|---|
+| Global | `commands.suggestion.enabled: false` in `.agent-settings.yml` | Whole project |
+| Per-command | `commands.suggestion.blocklist: [/cmd]` | Specific command stays as-is |
+| Per-conversation | `/command-suggestion-off` directive | Until user re-enables or chat ends |
+All three are deterministic and tested. `/command-suggestion-on`
+re-enables mid-conversation.
+## Consequences
+**Positive.** The maintainer's intent surfaces commands they didn't
+remember to type. Anti-noise heuristics keep the layer near-silent on
+prose. Goldens replay byte-stable; GT-CS9 covers adversarial echo.
+**Negative.** A new always-on rule adds context budget — kept under
+size budget by the size-enforcement rule. Per-command frontmatter now
+carries `suggestion.*` keys; the linter enforces them so drift is
+caught early.
+**Open.** Trigger drift over time — a command's
+`trigger_description` gets stale as conventions move. Mitigation: the
+artefact-engagement telemetry pipeline (default-off) measures
+suggestion pick-rate per command, surfacing weak triggers as
+retirement candidates without a hard SLA.
+## See also
+- [`command-suggestion`](../../.agent-src.uncompressed/rules/command-suggestion.md) — runtime rule
+- [`command-suggestion-eligibility.md`](command-suggestion-eligibility.md) — locked eligibility table
+- [`road-to-context-aware-command-suggestion.md`](../../agents/roadmaps/road-to-context-aware-command-suggestion.md) — phased delivery
+- [`adr-prompt-driven-execution.md`](adr-prompt-driven-execution.md) — `/work` entrypoint that explicit slash invocations route to
+- [`agent-settings`](../../.agent-src.uncompressed/templates/agent-settings.md) — `commands.suggestion.*` reference

package/docs/contracts/adr-implement-ticket-runtime.md ADDED Viewed

@@ -0,0 +1,122 @@
+---
+stability: stable
+---
+# ADR — `/implement-ticket` runtime: Python
+> **Status:** Decided · Phase 0 spike closed · 2026-04-22
+> **Context:** [`implement-ticket-flow.md`](implement-ticket-flow.md) ·
+> [`road-to-implement-ticket.md`](../../agents/roadmaps/road-to-implement-ticket.md)
+> **Supersedes:** the `Runtime: TBD` placeholder in `implement-ticket-flow.md`.
+## Decision
+The `/implement-ticket` orchestrator ships on **Python 3.10+** (stdlib + `pyyaml`).
+Bash is rejected as the runtime. It stays only where it already lives —
+the install driver, compression helper, and test runner — not the
+delivery flow.
+## Why this was a real question
+The repo already ships a Bash install + compression toolchain. A
+shell-native dispatcher would have reused the existing muscle memory and
+avoided adding Python as a delivery-runtime dependency. The spike
+existed to verify whether Bash could still carry the 8-step linear flow
+once real orchestration needs showed up (structured state, blocked/partial
+outcomes, metrics, tests). Result: not well.
+## Spike scope
+Both prototypes implement the same shape from
+`implement-ticket-flow.md`:
+- 8 linear steps (refine → memory → analyze → plan → implement → test →
+  verify → report)
+- `DeliveryState` as the only thing shared between steps
+- Three terminal outcomes: `success`, `blocked`, `partial`
+- Metrics JSON line per run (Q38 decision)
+- Identical fixtures: one clean ticket, one ambiguous ticket that must
+  block at `refine` with three numbered questions
+Sources: Phase-0 spike (`spike/implement-ticket/{bash,python}/`) — deleted after Phase 1 shipped; the evidence inlined below is the only surviving record.
+## Evidence (measured, not asserted)
+### Wall-clock mean (20 runs, 3 warmup, macOS ARM)
+| Scenario | Bash | Python | Winner |
+|---|---|---|---|
+| Clean, all 8 steps | 104 ms | 36 ms | **Python 2.9×** |
+| Blocked at step 1  |  60 ms | 35 ms | **Python 1.7×** |
+Harness was `spike/implement-ticket/bench.sh` (20 runs, 3 warmup, macOS ARM);
+raw numbers preserved in this ADR after the spike directory was deleted.
+Bash cost is dominated by per-step fork/exec (`yq`, `jq`, `perl`, step
+script). Every step added extends the Bash gap. Python keeps the whole
+dispatch in-process; step count is effectively free.
+### Test-writability
+| | pytest (Python) | shell asserts (Bash) |
+|---|---|---|
+| Lines to cover 3 scenarios | 59 | 72 |
+| Assert target | typed `state.outcomes`, `state.questions` | stdout string-contains |
+| Suite runtime | 0.01 s | ~2.5 s (15 subprocess launches) |
+| Test isolation | in-process `dispatch(state)` | fork per assertion |
+| IDE affordances | autocomplete, types, stack traces | grep only |
+### Error-propagation ergonomics
+- **Python:** outcomes flow through typed `StepResult`/`Outcome`
+  enums. One `dispatch()` return tuple covers success/blocked/partial.
+  Zero string parsing.
+- **Bash:** outcomes are encoded in exit codes (0/10/20), propagated
+  through `set +e` dances per step, mirrored into a JSON file with
+  `jq --arg` round-trips. Every step handler re-reads/rewrites the
+  state file. Brittle.
+### Source size
+Roughly equal (Bash 203 / Python 207 lines), but the Bash total is spread
+across 9 files with per-file shebang + `set -euo pipefail` boilerplate;
+Python is 3 files with shared types.
+## Tradeoffs we accept
+- **New hard dependency on Python 3.10+ and `pyyaml`.** Mitigated: Python 3
+  is already a build/test dependency of this repo (linters, compression,
+  `update_counts.py`). `pyyaml` is already pinned in `pyproject.toml` /
+  `requirements-*.txt`. Zero new install surface for contributors.
+- **We lose the "just-shell" story.** The install script stays Bash. The
+  delivery runtime being Python is a deliberate split: install is
+  side-effecting system work, delivery is structured data transformation.
+- **Python per-invocation boot is ~35 ms.** Accepted — it's flat, not
+  proportional to step count, and well below the user-perception floor.
+## Non-goals of this decision
+- Does not bless Python for every future spike — each flow decides
+  on its own evidence.
+- Does not commit to a specific framework (click, typer, bare argparse);
+  that is chosen during Phase 1 and kept minimal.
+- Does not move the compression/install scripts off Bash.
+## Consequences — unblocks
+- Phase 1 of `road-to-implement-ticket.md` can start: wire real step
+  handlers onto the dispatcher shape spiked here.
+- `implement-ticket-flow.md` will have its `Runtime: TBD` line replaced
+  with `Runtime: Python 3.10+` when Phase 1 lands.
+- Metrics contract (Q38, JSON lines under `agents/logs/implement-ticket/`)
+  is already demonstrated by both prototypes.
+## Follow-ups (not part of this ADR)
+- If later optimisation is needed, port the numbers above into a
+  `task bench-implement-ticket` target against the real handlers — the
+  original `bench.sh` lives only in this repo's git history now (see
+  commit `79f30e7`).
+- Decide CLI framework during Phase 1 (defer — argparse is enough for
+  the orchestrator skeleton).