npm - @event4u/agent-config - Versions diffs - 1.18.0 → 1.20.0 - Mend

@event4u/agent-config 1.18.0 → 1.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (181) hide show

package/.agent-src/commands/agent-handoff.md +14 -10
package/.agent-src/commands/chat-history/import.md +170 -0
package/.agent-src/commands/chat-history/learn.md +178 -0
package/.agent-src/commands/chat-history/show.md +17 -18
package/.agent-src/commands/chat-history.md +26 -25
package/.agent-src/commands/council/default.md +77 -82
package/.agent-src/commands/create-pr.md +28 -8
package/.agent-src/commands/feature/roadmap.md +22 -0
package/.agent-src/commands/roadmap/create.md +38 -6
package/.agent-src/commands/roadmap/execute.md +36 -9
package/.agent-src/commands/sync-gitignore.md +1 -1
package/.agent-src/contexts/communication/rules-auto/skill-quality-mechanics.md +76 -0
package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +3 -3
package/.agent-src/contexts/communication/rules-auto/user-interaction-mechanics.md +5 -12
package/.agent-src/rules/agent-authority.md +1 -0
package/.agent-src/rules/agent-docs.md +1 -0
package/.agent-src/rules/analysis-skill-routing.md +1 -0
package/.agent-src/rules/architecture.md +1 -0
package/.agent-src/rules/artifact-drafting-protocol.md +1 -0
package/.agent-src/rules/artifact-engagement-recording.md +1 -0
package/.agent-src/rules/ask-when-uncertain.md +1 -0
package/.agent-src/rules/augment-portability.md +1 -0
package/.agent-src/rules/augment-source-of-truth.md +1 -0
package/.agent-src/rules/autonomous-execution.md +1 -0
package/.agent-src/rules/capture-learnings.md +1 -0
package/.agent-src/rules/cli-output-handling.md +2 -2
package/.agent-src/rules/command-suggestion-policy.md +1 -0
package/.agent-src/rules/commit-conventions.md +1 -0
package/.agent-src/rules/commit-policy.md +1 -0
package/.agent-src/rules/context-hygiene.md +22 -0
package/.agent-src/rules/direct-answers.md +11 -2
package/.agent-src/rules/docker-commands.md +1 -0
package/.agent-src/rules/docs-sync.md +1 -0
package/.agent-src/rules/downstream-changes.md +1 -0
package/.agent-src/rules/e2e-testing.md +1 -0
package/.agent-src/rules/guidelines.md +1 -0
package/.agent-src/rules/improve-before-implement.md +1 -0
package/.agent-src/rules/language-and-tone.md +38 -6
package/.agent-src/rules/laravel-translations.md +1 -0
package/.agent-src/rules/markdown-safe-codeblocks.md +1 -0
package/.agent-src/rules/minimal-safe-diff.md +1 -0
package/.agent-src/rules/missing-tool-handling.md +1 -0
package/.agent-src/rules/model-recommendation.md +1 -0
package/.agent-src/rules/no-attribution-footers.md +48 -0
package/.agent-src/rules/no-cheap-questions.md +1 -0
package/.agent-src/rules/no-roadmap-references.md +2 -1
package/.agent-src/rules/non-destructive-by-default.md +1 -0
package/.agent-src/rules/onboarding-gate.md +26 -0
package/.agent-src/rules/package-ci-checks.md +1 -0
package/.agent-src/rules/php-coding.md +1 -0
package/.agent-src/rules/preservation-guard.md +1 -0
package/.agent-src/rules/review-routing-awareness.md +1 -0
package/.agent-src/rules/reviewer-awareness.md +1 -0
package/.agent-src/rules/roadmap-progress-sync.md +22 -0
package/.agent-src/rules/role-mode-adherence.md +2 -2
package/.agent-src/rules/rule-type-governance.md +1 -0
package/.agent-src/rules/runtime-safety.md +1 -0
package/.agent-src/rules/scope-control.md +1 -0
package/.agent-src/rules/security-sensitive-stop.md +1 -0
package/.agent-src/rules/size-enforcement.md +1 -0
package/.agent-src/rules/skill-improvement-trigger.md +1 -0
package/.agent-src/rules/skill-quality.md +50 -0
package/.agent-src/rules/slash-command-routing-policy.md +39 -0
package/.agent-src/rules/think-before-action.md +1 -0
package/.agent-src/rules/token-efficiency.md +1 -0
package/.agent-src/rules/tool-safety.md +1 -0
package/.agent-src/rules/ui-audit-gate.md +1 -0
package/.agent-src/rules/upstream-proposal.md +1 -0
package/.agent-src/rules/user-interaction.md +22 -5
package/.agent-src/rules/verify-before-complete.md +1 -0
package/.agent-src/skills/ai-council/SKILL.md +4 -5
package/.agent-src/skills/dcf-modeling/SKILL.md +89 -0
package/.agent-src/skills/funnel-analysis/SKILL.md +100 -0
package/.agent-src/skills/md-language-check/SKILL.md +1 -1
package/.agent-src/skills/okr-tree-modeling/SKILL.md +93 -0
package/.agent-src/skills/rice-prioritization/SKILL.md +100 -0
package/.agent-src/skills/roadmap-management/SKILL.md +29 -4
package/.agent-src/skills/subagent-orchestration/SKILL.md +34 -2
package/.agent-src/skills/unit-economics-modeling/SKILL.md +104 -0
package/.agent-src/skills/using-git-worktrees/SKILL.md +1 -0
package/.agent-src/skills/verify-completion-evidence/SKILL.md +8 -1
package/.agent-src/templates/agent-settings.md +21 -26
package/.agent-src/templates/roadmaps.md +8 -3
package/.agent-src/templates/scripts/work_engine/hook_bootstrap.py +16 -5
package/.agent-src/templates/scripts/work_engine/hooks/__init__.py +4 -4
package/.agent-src/templates/scripts/work_engine/hooks/builtin/__init__.py +4 -4
package/.agent-src/templates/scripts/work_engine/hooks/builtin/_chat_history_base.py +7 -51
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_append.py +1 -2
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_halt_append.py +1 -2
package/.agent-src/templates/scripts/work_engine/hooks/builtin/decision_trace.py +163 -0
package/.agent-src/templates/scripts/work_engine/hooks/builtin/memory_visibility.py +110 -0
package/.agent-src/templates/scripts/work_engine/hooks/settings.py +36 -0
package/.agent-src/templates/scripts/work_engine/scoring/decision_trace.py +141 -0
package/.agent-src/templates/scripts/work_engine/scoring/memory_visibility.py +125 -0
package/.agent-src/templates/skill.md +30 -1
package/.claude-plugin/marketplace.json +8 -4
package/AGENTS.md +44 -3
package/CHANGELOG.md +173 -0
package/README.md +22 -22
package/config/agent-settings.template.yml +42 -13
package/config/gitignore-block.txt +4 -4
package/docs/architecture.md +3 -3
package/docs/catalog.md +18 -13
package/docs/contracts/adr-chat-history-split.md +10 -1
package/docs/contracts/adr-settings-sync-engine.md +127 -0
package/docs/contracts/command-clusters.md +1 -1
package/docs/contracts/cross-wing-handoff.md +133 -0
package/docs/contracts/decision-trace-v1.md +146 -0
package/docs/contracts/file-ownership-matrix.json +348 -126
package/docs/contracts/hook-architecture-v1.md +220 -0
package/docs/contracts/memory-visibility-v1.md +122 -0
package/docs/contracts/one-off-script-lifecycle.md +109 -0
package/docs/contracts/rule-interactions.yml +22 -0
package/docs/customization.md +2 -1
package/docs/development.md +4 -1
package/docs/getting-started.md +21 -29
package/docs/guidelines/agent-infra/ask-when-uncertain-demos.md +1 -1
package/docs/guidelines/agent-infra/layered-settings.md +32 -13
package/docs/hook-payload-capture.md +221 -0
package/docs/migrations/commands-1.15.0.md +17 -12
package/docs/skills-catalog.md +5 -4
package/llms.txt +4 -3
package/package.json +1 -1
package/scripts/agent-config +45 -1
package/scripts/ai_council/_default_prices.py +4 -4
package/scripts/ai_council/bundler.py +3 -3
package/scripts/ai_council/clients.py +25 -9
package/scripts/ai_council/modes.py +3 -4
package/scripts/ai_council/one_off_archive/2026-05/README.md +22 -0
package/scripts/ai_council/one_off_archive/2026-05/_one_off_roundtrip.py +13 -8
package/scripts/ai_council/one_off_archive/2026-05/_one_off_tier_retrofit.py +180 -0
package/scripts/ai_council/pricing.py +10 -9
package/scripts/ai_council/session.py +92 -0
package/scripts/build_rule_trigger_matrix.py +1 -9
package/scripts/capture_showcase_session.py +361 -0
package/scripts/chat_history.py +963 -597
package/scripts/check_always_budget.py +7 -2
package/scripts/check_references.py +12 -2
package/scripts/context_hygiene_hook.py +14 -6
package/scripts/council_cli.py +407 -0
package/scripts/hook_manifest.yaml +217 -0
package/scripts/hooks/__init__.py +1 -0
package/scripts/hooks/augment-chat-history.sh +10 -0
package/scripts/hooks/augment-dispatcher.sh +72 -0
package/scripts/hooks/cline-dispatcher.sh +86 -0
package/scripts/hooks/cowork-dispatcher.sh +98 -0
package/scripts/hooks/cursor-dispatcher.sh +76 -0
package/scripts/hooks/dispatch_hook.py +383 -0
package/scripts/hooks/envelope.py +98 -0
package/scripts/hooks/gemini-dispatcher.sh +117 -0
package/scripts/hooks/state_io.py +122 -0
package/scripts/hooks/windsurf-dispatcher.sh +123 -0
package/scripts/hooks_status.py +157 -0
package/scripts/install-hooks.sh +2 -2
package/scripts/install.py +725 -87
package/scripts/install.sh +38 -1
package/scripts/lint_handoffs.py +214 -0
package/scripts/lint_hook_manifest.py +217 -0
package/scripts/lint_one_off_age.py +184 -0
package/scripts/lint_rule_tiers.py +78 -0
package/scripts/lint_showcase_sessions.py +148 -0
package/scripts/minimal_safe_diff_hook.py +245 -0
package/scripts/onboarding_gate_hook.py +13 -8
package/scripts/readme_linter.py +12 -3
package/scripts/redact_hook_capture.py +148 -0
package/scripts/roadmap_progress_hook.py +5 -0
package/scripts/schemas/skill.schema.json +5 -0
package/scripts/skill_linter.py +163 -1
package/scripts/sync_agent_settings.py +32 -129
package/scripts/sync_yaml_rt.py +734 -0
package/scripts/update_prices.py +3 -3
package/scripts/verify_before_complete_hook.py +216 -0
package/.agent-src/commands/chat-history/checkpoint.md +0 -126
package/.agent-src/commands/chat-history/clear.md +0 -103
package/.agent-src/commands/chat-history/resume.md +0 -183
package/.agent-src/rules/chat-history-cadence.md +0 -109
package/.agent-src/rules/chat-history-ownership.md +0 -123
package/.agent-src/rules/chat-history-visibility.md +0 -96
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_heartbeat.py +0 -50
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_turn_check.py +0 -49
package/scripts/check_phase_coupling.py +0 -148

package/.agent-src/rules/slash-command-routing-policy.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: "auto"
+tier: "1"
 description: "When user types a slash command like /create-pr, /commit, or pastes command file content"
 alwaysApply: false
 source: package
@@ -31,3 +32,41 @@ This is **irrelevant** for command detection.
 - If command file content appears in the context alongside an open file, the **command invocation takes priority**.
 - Do NOT confuse "file is open" with "user wants to discuss this file".
 - The user's typed message determines intent — not editor state.
+## Read the whole prompt — command is the operator, prose is the target
+```
+/<command> IS THE OPERATOR.
+THE REST OF THE USER MESSAGE NAMES THE TARGET.
+NEVER ASSUME THE COMMAND NAME IS THE TARGET.
+```
+Slash token = **what to do**; surrounding prose = **what to do it on**.
+- `/council and analyse chat-history` → target is `chat-history`,
+  not `council`. Council is the *tool*, prose names the *artefact*.
+- `/work the memory bug from PROJ-123` → target is "the memory bug
+  from PROJ-123".
+- `/fix ci and then open a PR` → target is "CI failure"; trailing
+  "open a PR" is a follow-up needing separate permission (per
+  `scope-control`).
+### Pre-flight before expensive operations
+Before any operation costing real time or money — external API call,
+large codebase analysis, multi-file refactor, council run, generated
+test suite — run silently:
+1. Re-read the **whole** user message, not just slash + first token.
+2. Identify the target the prose actually names.
+3. Target unambiguous → execute, no question.
+4. Target **genuinely** ambiguous after re-reading (prose names *two*
+   artefacts, can't tell which is the operand) → ask ONE
+   disambiguating numbered-options question per
+   [`ask-when-uncertain`](ask-when-uncertain.md), then proceed.
+**Not** a license to re-introduce cheap questions (`no-cheap-questions`
+still binds). Threshold: *"would this guess waste the user's tokens,
+money, or trust?"* — not *"I'd feel safer asking"*. Single failure
+mode to avoid: spending API spend on the wrong artefact because the
+agent fixated on the command name.

package/.agent-src/rules/think-before-action.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: "auto"
+tier: "2b"
 description: "Before coding, modifying, or debugging — analyze first, verify with real tools, never guess or trial-and-error"
 alwaysApply: false
 source: package

package/.agent-src/rules/token-efficiency.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: "auto"
+tier: "2a"
 description: "When running CLI tools, fetching logs, or producing replies — redirect verbose output, minimize tool calls, keep replies concise"
 alwaysApply: false
 source: package

package/.agent-src/rules/tool-safety.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: auto
+tier: "2b"
 source: package
 description: "When a skill uses external tools — enforce allowlist, deny-by-default, and no hidden credential patterns"
 ---

package/.agent-src/rules/ui-audit-gate.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: "auto"
+tier: "2b"
 description: "Writing or editing UI — components, screens, partials, layouts, design tokens — require existing-ui-audit findings in state.ui_audit before non-trivial UI change; gate, not suggestion"
 alwaysApply: false
 source: package

package/.agent-src/rules/upstream-proposal.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: "auto"
+tier: "2a"
 description: "After creating or significantly improving a skill, rule, guideline, or command — ask if it should be contributed upstream to the shared package"
 alwaysApply: false
 source: package

package/.agent-src/rules/user-interaction.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: "auto"
+tier: "3"
 description: "Asking the user a question, presenting options, or summarizing progress — numbered-options Iron Law, single-recommendation rule, progress indicators"
 alwaysApply: false
 source: package
@@ -21,6 +22,9 @@ THE OPTION BLOCK STAYS NEUTRAL. THE RECOMMENDATION LINE IS THE ONLY SOURCE OF TR
 DRIFT BETWEEN OPTION-BLOCK AND PROSE IS STRUCTURALLY IMPOSSIBLE WHEN THE TAG DOES NOT EXIST.
 MISSING RECOMMENDATION = RULE VIOLATION, NOT A SLIP.
 POSITION-AGNOSTIC. END-OF-TURN MENUS COUNT. NEXT-STEP LISTS COUNT. NO EXCEPTIONS.
+THE RECOMMENDATION LINE LIVES DIRECTLY UNDER THE OPTIONS BLOCK. NOWHERE ELSE.
+PROSE NAMING A "RECOMMENDED" PATH ABOVE OR BEFORE THE OPTIONS BLOCK = NO RECOMMENDATION.
+WRONG-LANGUAGE LABEL (`Recommendation:` WHEN USER IS GERMAN, OR VICE VERSA) = NO RECOMMENDATION.
 ```
 The agent has read the code, the contracts, the trade-offs. Refusing
@@ -49,6 +53,17 @@ recommendation line is mandatory.
 - If the agent genuinely cannot pick (rare — true 50/50 with missing data),
   say what data would break the tie and ask for that instead.
+**No trailing open-ended question after numbered options:**
+If the reply contains numbered options, the recommendation line IS
+the closer. No `Welcher Pfad?` / `What's it gonna be?` / `Was meinst
+Du?` / `Was sagst Du?` / `Welche willst Du?` / `What do you think?`
+after the recommendation — that reframes the vote as an opinion poll
+and is hedging in disguise. The user picks a number; the agent does
+not re-ask. Permitted: a clarifying caveat sentence on the
+recommendation line itself (`Caveat: flip to 2 if …`). Forbidden:
+any standalone trailing question that re-opens the choice.
 **What does NOT count as a recommendation:**
 - "Both work" / "either is fine" / "depends on what you prefer"
@@ -86,11 +101,13 @@ Mechanical backstop: `python3 scripts/check_reply_consistency.py --stdin < draft
 (non-zero exit on any rule above). Self-scan is the primary gate; the
 script is the deterministic safety net for ambiguous cases.
-Common failure modes (end-of-turn menu skipped, "no preference"
-hedges, multi-block reply with one recommendation) and the named
-slip catalog live in
-[`contexts/communication/rules-auto/user-interaction-mechanics.md`](../contexts/communication/rules-auto/user-interaction-mechanics.md)
-§ Common failure modes.
+### Common failure modes — known, named, no excuses
+- **End-of-turn menu skipped.** Reply answers the question fine, then ends with `1. … 2. … 3. …` and no `Empfehlung:`. Iron Law 1 was violated — these are numbered options, position is irrelevant.
+- **Trailing-question hedge.** Reply has options + recommendation, but ends with `Welcher Pfad?` / `What's it gonna be?` / `Was meinst Du?` — the open question reframes the vote as opinion-poll. Banned by Iron Law 1; the recommendation line is the closer.
+- **"Genuinely no preference" hedge.** Pick anyway. The agent has more context than the user on the trade-off; refusing to pick dumps the work back. Pick the safest option, name the flip-condition.
+- **"User knows the project better" hedge.** Same failure mode, different costume. The user asked for an opinion by virtue of accepting the options block; deliver it.
+- **Multi-block reply with one recommendation.** Two options blocks but only one `Empfehlung:` line — the second block is unguarded. Rule 5 of Iron Law 2 closes this.
 ## Numbered Options — Always

package/.agent-src/rules/verify-before-complete.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 type: "always"
+tier: "2a"
 description: "Verify before completion — run tests and quality tools before claiming done"
 alwaysApply: true
 source: package

package/.agent-src/skills/ai-council/SKILL.md CHANGED Viewed

@@ -82,12 +82,11 @@ travel changes.
 |---|---|---|---|---|
 | `api` | `AnthropicClient` / `OpenAIClient` | yes | provider SDK + key from `~/.config/agent-config/<provider>.key` | shipped |
 | `manual` | `ManualClient` | no | `stdout` (prompt block) + `stdin` (user pastes the web-UI reply, terminated by a line containing only `END`) | shipped (Phase 2b) |
-| `playwright` | `PlaywrightClient` | no | persistent-profile browser at the provider's chat URL via DOM adapter | reserved (Phase 2c — capture-only) |
 Resolution lives in `scripts/ai_council/modes.py`:
 `resolve_mode(name, invocation_mode, member_settings, global_mode)`
 with precedence **invocation flag > per-member setting > global
-setting > default (`api`)**. Whitespace-and-case insensitive; empty
+setting > default (`manual`)**. Whitespace-and-case insensitive; empty
 strings fall through; unknown values raise `InvalidModeError` with
 the offending settings path (`ai_council.mode`,
 `ai_council.members.<name>.mode`, or `/council mode=`).
@@ -113,8 +112,8 @@ that member and the orchestrator stops the fan-out.
 ### Cost-gate bypass for non-billable members
 `ExternalAIClient.billable` is the contract. Clients with
-`billable=False` (today: `ManualClient`; future: `PlaywrightClient`)
-bypass the cost gate entirely — the orchestrator skips the
+`billable=False` (`ManualClient`) bypass the cost gate entirely —
+the orchestrator skips the
 projection check, the `on_overrun` callback, and the USD-budget
 short-circuit for that member, but still records the response's
 token counts (from the manual-paste length heuristic or the
@@ -225,7 +224,7 @@ per-invocation caps from `ai_council.cost_budget`:
   if the callback returns False or is absent, tags the member
   `daily_budget_exceeded` instead of `cost_budget_exceeded`.
-Prices come from `.agent-prices.md` (gitignored, refreshed weekly).
+Prices come from `agents/.agent-prices.md` (gitignored, refreshed weekly).
 The pricing module bootstraps it from `_default_prices.py` on first
 use and flags it stale when older than the most recent Monday 00:00
 UTC.

package/.agent-src/skills/dcf-modeling/SKILL.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: dcf-modeling
+description: "Wing-4 valuation cognition for a CFO / finance-partner. Use when a deal, internal investment, or board ask names DCF, intrinsic value, WACC, terminal value, or 'what's it worth on a 5-year hold'."
+status: active
+tier: senior
+source: package
+---
+# dcf-modeling
+## When to use
+- A buy-build-or-partner decision needs an intrinsic-value anchor, not just a multiple.
+- A board pack asks for sensitivity to discount rate or terminal-growth assumptions.
+- An acquisition target's seller-deck IRR claims need a counter-model.
+Do NOT use for revenue forecasting alone, market-sizing, or comp-multiple-only screens — those route elsewhere (see Related Skills).
+## Procedure
+### Step 0: Inspect
+1. Confirm the target has ≥3 years of audited or reviewed financials, or a clearly-labelled forecast that names every assumption.
+2. Note the cognition cluster: this is **intrinsic-value cognition**, not multiple-arbitrage.
+### Step 1: Lock the assumption table
+1. Pull or estimate the five drivers — revenue growth (per year, declining to terminal), EBIT margin path, tax rate, capex/sales, change in net working capital/sales.
+2. Decompose WACC: cost of equity (CAPM — risk-free + β × ERP), cost of debt (after-tax), capital structure target weights.
+3. Pick a terminal-value method **once** — either Gordon-growth (`FCFF_t+1 / (WACC − g)`) or exit-multiple. Naming both inflates spurious precision.
+### Step 2: Project free cash flow
+1. Build a 5-year FCFF row: `EBIT × (1 − t) + D&A − Capex − ΔNWC`.
+2. Discount each year by `1 / (1 + WACC)^t`.
+3. Compute terminal value at year 5, discount back.
+4. Sum PV(FCFF) + PV(TV) = enterprise value. Subtract net debt → equity value.
+### Step 3: Sensitivity grid
+1. Build a 5×5 grid: WACC ±200 bps × terminal growth ±100 bps (or exit multiple ±2 turns).
+2. Flag the corner cells where equity value flips sign or moves >25% from base — those are the load-bearing assumptions.
+### Step 4: Validate
+1. Cross-check implied EV/EBITDA against trading comps. If your DCF prints 22× and the sector trades at 11×, **the assumptions are wrong**, not the market.
+2. State the two assumptions that drive >50% of the valuation. If you can't name them, the model is undisciplined.
+## Gotcha
+- Terminal value usually carries 60–80% of total PV. Treating TV as a footnote is the most common DCF malpractice.
+- WACC sensitivity is non-linear near `WACC ≈ g`; the Gordon formula explodes. Cap displayed cells; don't pretend the corner is a real number.
+- Forecasted FCFF that grows faster than revenue forever implies infinite margin expansion — the model will silently smuggle it in unless you bound EBIT margin at a stated ceiling.
+- Synergies in an M&A DCF belong in a separate column. Comingling them with standalone FCFF is how acquirers overpay.
+## Do NOT
+- Do NOT use a DCF as the sole valuation when the business is < 3 years old or has negative operating cash flow — uncertainty bands swamp the signal.
+- Do NOT discount levered cash flow by WACC. Use FCFF↔WACC or FCFE↔Ke; never cross.
+- Do NOT report a point estimate without the sensitivity grid. A single number is a prediction, not a valuation.
+## Related Skills
+**WHEN to use this**
+- The decision needs intrinsic value, not relative value.
+- The asset has a multi-year cash-flow profile worth modelling explicitly.
+- The board wants an answer to "what assumption breaks the deal?"
+**WHEN NOT to use this**
+- Pure unit-economics question (CAC/LTV/payback) — route to [`unit-economics-modeling`](../unit-economics-modeling/SKILL.md).
+- Prioritization of competing internal bets, not valuation — route to [`rice-prioritization`](../rice-prioritization/SKILL.md).
+- Strategic-objective decomposition, not value — route to [`okr-tree-modeling`](../okr-tree-modeling/SKILL.md).
+## When the agent should load this
+- "What's this acquisition worth on a 5-year hold?"
+- "Build me a DCF on these financials."
+- "How sensitive is the valuation to WACC?"
+- "What discount rate does the seller's IRR imply?"
+- "Counter-model this seller deck."
+## Output
+1. **`assumptions.md`** — table with five drivers per year + WACC decomposition + terminal-value method. One row per assumption, one column per year.
+2. **`fcff-projection.md`** — 5-year FCFF + discount factors + PV column + terminal-value PV + bridge to equity value.
+3. **`sensitivity-grid.md`** — 5×5 markdown table (WACC × terminal growth or exit multiple). Bold the cells where equity value flips sign or moves >25% from base.
+4. **`valuation-narrative.md`** — three paragraphs: (a) point estimate + range, (b) the two load-bearing assumptions, (c) cross-check against trading comps with named delta.

package/.agent-src/skills/funnel-analysis/SKILL.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+name: funnel-analysis
+description: "Use when diagnosing where a SaaS or product funnel leaks — visitor → signup → activation → paid → retained — channel-agnostic, conversion-rate-driven."
+status: active
+tier: senior
+source: package
+---
+# funnel-analysis
+## When to use
+- Conversion to paid dropped and nobody knows which step broke.
+- A new signup channel went live and you need to compare its funnel shape to the baseline.
+- A board ask: "where does the money leak between landing page and paying customer?"
+Do NOT use for ranking features, valuation, or OKR decomposition (see Related Skills). Funnel analysis is a **diagnostic**, not a roadmap.
+## Procedure
+### Step 0: Inspect
+1. Confirm the cognition cluster: this is **conversion diagnosis**, channel-agnostic. Paid social, organic, partner, and self-serve all share the same shape; only the inputs differ.
+2. Confirm event tracking exists for all 5 stages. If even one stage is inferred, the analysis is unreliable — flag and proceed under that caveat.
+### Step 1: Lock the 5 stages
+1. The canonical SaaS funnel: **Visitor → Signup → Activation → Paid → Retained-D30**.
+2. Activation is the load-bearing definition. Pick the **single event** that historically correlates with paid conversion — not "logged in", not "viewed dashboard". For most SaaS this is "completed first meaningful action" (sent first invoice, ran first query, invited first teammate).
+3. Retained-D30 = still active 30 days after first paid charge. Earlier than D30 is noise; later requires more data.
+### Step 2: Pull stage-to-stage conversion
+1. Compute conversion rate at each step: `stage_n / stage_n-1`. Always use cohorts (signup-week or signup-month), never aggregate snapshots — aggregates lie when traffic mix changes.
+2. For each rate, attach a 95% confidence interval. Tiny denominators give big bands; the band is half the story.
+3. Plot a 12-week trend per rate. A single point is gossip; a trend is evidence.
+### Step 3: Benchmark vs internal baseline
+1. The right benchmark is **your own funnel one quarter ago**, not industry averages. Industry averages mix verticals so coarsely they're useless for action.
+2. For each stage: is current rate within ±2 percentage points of trailing-quarter median? If not, that stage is the primary suspect.
+3. If multiple stages drift simultaneously, the cause is upstream (acquisition mix change, broken instrumentation), not the stage itself.
+### Step 4: Segment the broken stage
+1. Take the suspect stage and segment by: channel · device · plan · geo · cohort week.
+2. The drop is almost always concentrated in one segment, not uniform. Uniform drops point to instrumentation.
+3. Anti-pattern: averaging across segments and treating the average as actionable. The average user does not exist.
+### Step 5: Hypothesise causes
+1. For the broken segment-stage, write 3 candidate causes. Rank by testability, not plausibility.
+2. The cheapest experiment to falsify the top candidate is the next step — usually a UX change, a copy test, or an onboarding tweak.
+3. If no cause is testable in under 2 weeks, the analysis is not yet sharp enough.
+### Step 6: Validate
+1. Recompute the broken rate after the experiment ships. Same cohort definition. Same window.
+2. If the rate moves but the downstream rates don't follow, you fixed a vanity step. Keep going.
+## Gotcha
+- "Activation" defined as a low-friction event (signup confirmation, first login) gives you a flatter funnel that is useless for prediction. Activation must correlate with paid.
+- Aggregate funnel rates that look stable can hide a 30-point drop in one channel masked by a 30-point lift in another. Always segment.
+- D7 retention looks great compared to D30. Pick the metric that matches the contract length, not the one that flatters.
+- Holiday weeks, deploys, marketing pushes, and refund days distort cohorts. Annotate the timeline; don't pretend a 5pp drop is real on a known holiday.
+## Do NOT
+- Do NOT use industry-average benchmarks as a target. They mix B2B with B2C, freemium with high-touch — the average is meaningless.
+- Do NOT compare a 1-week cohort to a 12-week trailing median; sample size is too small to draw conclusions.
+- Do NOT diagnose retention on a funnel without separating new-user retention from re-engaged-user retention.
+## Related Skills
+**WHEN to use this**
+- Where in the funnel did conversion drop?
+- Compare the funnel shape between two channels.
+**WHEN NOT to use this**
+- Pricing tier or unit-economics question — route to [`unit-economics-modeling`](../unit-economics-modeling/SKILL.md).
+- Roadmap ranking from funnel findings — route to [`rice-prioritization`](../rice-prioritization/SKILL.md).
+- Setting team OKRs around the diagnosed metric — route to [`okr-tree-modeling`](../okr-tree-modeling/SKILL.md).
+- Valuing the business that owns the funnel — route to [`dcf-modeling`](../dcf-modeling/SKILL.md).
+## When the agent should load this
+- "Where is our funnel leaking?"
+- "Why did paid conversion drop last month?"
+- "Compare the funnel for paid social vs organic."
+- "Diagnose this dropoff between signup and activation."
+- "Is this drop real or instrumentation?"
+## Output
+1. **`funnel-table.md`** — 5-stage funnel with cohort rates, 95% CI, and 12-week trend (sparkline or compact ASCII). One row per cohort week or month.
+2. **`segment-breakdown.md`** — table of the broken stage segmented by channel · device · plan · geo. Rates with CIs. Suspect segments highlighted.
+3. **`hypothesis-list.md`** — top 3 causes for the broken segment-stage with cheapest-falsification experiment per cause and an explicit prediction for the next measurement.

package/.agent-src/skills/md-language-check/SKILL.md CHANGED Viewed

@@ -32,7 +32,7 @@ Do NOT use when:
 - Editing project content outside the trees listed above (READMEs of
   consumer projects, application docs that follow a different policy)
-- Reviewing chat history files (`.agent-chat-history` is JSONL, not `.md`)
+- Reviewing chat history files (`agents/.agent-chat-history` is JSONL, not `.md`)
 - Inspecting non-`.md` files — the checker rejects them with a warning
 ## Procedure

package/.agent-src/skills/okr-tree-modeling/SKILL.md ADDED Viewed

@@ -0,0 +1,93 @@
+---
+name: okr-tree-modeling
+description: "Use when decomposing a company objective into team OKRs, auditing a draft OKR tree, or stress-testing an existing one for measurability and laddering."
+status: active
+tier: senior
+source: package
+---
+# okr-tree-modeling
+## When to use
+- A leadership team wrote a quarterly objective and needs three measurable KRs that actually move it.
+- A draft OKR tree needs review for orphan KRs, vanity metrics, or KRs that ladder to two parents.
+- A team-level OKR set needs to be checked against the company objective it claims to serve.
+Do NOT use for ranking competing initiatives within one OKR — that's prioritization, not decomposition (route elsewhere — see Related Skills).
+## Procedure
+### Step 0: Inspect
+1. Identify the cognition cluster. OKR-as-strategy lives near the CEO/COO; OKR-as-PM-tool lives near the head of product. The decomposition mechanic is the same; the ladder-up target differs.
+2. Confirm timeframe (quarter / half / year) — KR cadence depends on it.
+### Step 1: Lock the parent objective
+1. Restate the parent in one sentence with **a verb of change** (grow, reduce, ship, win) and an outcome — not an output.
+2. Bad: "Improve onboarding." Good: "Reduce time-to-first-value for new accounts to under 7 days."
+3. If the parent has no verb of change or no outcome, the tree is built on sand. Stop and rewrite the parent.
+### Step 2: Decompose into 3 KRs
+1. Each KR must satisfy four tests: (a) measurable end-state, (b) the team owns the lever, (c) achievable iff the parent is achieved, (d) failing it should be visibly bad.
+2. Three is the floor and the ceiling. One KR makes the objective brittle; five dilutes ownership.
+3. Anti-pattern: "ship X feature" as a KR. Shipping is an output. The KR is what changes when the feature lands.
+### Step 3: Cascade to team-level
+1. For each company KR, define 2–3 team-level KRs that, taken together, achieve the parent KR — not duplicate it.
+2. Anti-pattern: copy-paste the parent KR with a smaller number ("60% of company KR is 30% for our team"). Real cascades change shape — input metrics for some teams, leading indicators for others.
+3. Mark each team KR as **trailing** (lagging outcome — revenue, retention) or **leading** (input the team controls — activation rate, response time). A tree that is all-trailing is unactionable.
+### Step 4: Cadence + check-ins
+1. Weekly check-in: leading KRs only. Trailing KRs are reviewed monthly.
+2. End of period: confidence score (0.0–1.0) per KR, written 3× across the period to surface drift.
+### Step 5: Validate
+1. Walk the tree top-down: does every leaf KR ladder cleanly to exactly one parent KR?
+2. Walk it bottom-up: if every team hits 0.7 confidence, does the company objective land?
+3. Count vanity metrics — KRs that move but don't matter. If more than zero, rewrite.
+## Gotcha
+- "100% of teams adopt X" is a participation metric, not an outcome KR. The model loves to write these.
+- A KR owned by two teams is owned by no team. Single accountability is non-negotiable.
+- Stretch goals that nobody believes are achievable produce sandbagging on the OKR after, not stretch.
+- "Customer happiness" + NPS is a sentiment proxy, not an outcome metric. Tie KRs to behavior (renewal, expansion, usage), not feeling.
+## Do NOT
+- Do NOT write more than 3 KRs per objective — focus is the entire point.
+- Do NOT use "achieve $X revenue" as a KR for a team that doesn't own pricing or pipeline; that's setting them up to fail blameably.
+- Do NOT cascade by simple percentage allocation — different teams contribute different mechanisms, not different fractions.
+## Related Skills
+**WHEN to use this**
+- The ask is decomposition of an objective into measurable KRs.
+- A draft OKR tree needs structural review (cascade integrity, leading/trailing balance).
+**WHEN NOT to use this**
+- Prioritization of competing features inside one KR — route to [`rice-prioritization`](../rice-prioritization/SKILL.md).
+- Conversion-rate diagnosis on a funnel KR — route to [`funnel-analysis`](../funnel-analysis/SKILL.md).
+- Valuation-impact modeling of strategic objectives — route to [`dcf-modeling`](../dcf-modeling/SKILL.md).
+## When the agent should load this
+- "Help me write OKRs for next quarter."
+- "Review this OKR tree — does it ladder up?"
+- "What KRs would actually move our retention objective?"
+- "Are these team OKRs measurable?"
+- "We have 8 KRs per team — too many?"
+## Output
+1. **`okr-tree.md`** — markdown tree: company objective → 3 company KRs → 2–3 team KRs each. Each leaf carries owner, measure, target, trailing/leading tag, check-in cadence.
+2. **`cascade-audit.md`** — orphans (KRs with no parent), oversubscribed parents (KR with >3 children), vanity metrics flagged with a one-line replacement suggestion.
+3. **`confidence-template.md`** — empty 3-column table (start / mid / end) per KR, ready for the period's confidence updates.

package/.agent-src/skills/rice-prioritization/SKILL.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+name: rice-prioritization
+description: "Use when ranking competing initiatives for a roadmap, breaking a tie between two features, or auditing a backlog for hidden low-value work via Reach × Impact × Confidence ÷ Effort."
+status: active
+tier: senior
+source: package
+---
+# rice-prioritization
+## When to use
+- A backlog has more candidates than capacity for the next quarter and someone has to pick.
+- A PM and an engineering lead disagree on what ships first and need a shared framework.
+- A draft roadmap reads like a wish list — no transparency on **why** these and not those.
+Do NOT use for valuation, OKR decomposition, or funnel-stage diagnosis (see Related Skills).
+## Procedure
+### Step 0: Inspect
+1. Confirm there are at least 5 candidates. RICE on 2 items is theatre; just argue the merits.
+2. Confirm there is a shared definition of the **target user** for "Reach" — RICE breaks if two scorers count different populations.
+### Step 1: Score Reach
+1. Reach = number of users / events / requests **per fixed time window** (per quarter is the default).
+2. Use absolute counts pulled from analytics or product DB, not percentages — percentages hide tiny denominators.
+3. If the data isn't there, write the query you'd run and say so. Do not invent numbers.
+### Step 2: Score Impact
+1. Use the canonical 5-point scale: 0.25 (minimal) · 0.5 (low) · 1 (medium) · 2 (high) · 3 (massive).
+2. Anchor each level with a concrete past shipped feature ("medium = like the search filter we shipped Q2"). Without anchors, scorers drift.
+3. Impact is **per affected user**, not aggregate. Aggregate is what RICE produces, not what you input.
+### Step 3: Score Confidence
+1. Confidence is a **percentage** — 100 / 80 / 50 / "low and we should not score this yet".
+2. Anything below 50 means: stop, do a spike or a research week, then re-score. RICE does not rescue ignorance.
+3. Confidence multiplies — it is the model's discount for unknown unknowns.
+### Step 4: Score Effort
+1. Effort = person-months for the smallest viable shippable slice. Not the fantasy version.
+2. Engineering owns this number. PMs scoring effort is the most common process failure.
+3. Effort < 0.5 person-months almost always means scope is underestimated — surface and ask.
+### Step 5: Compute and rank
+1. RICE = `(Reach × Impact × Confidence) / Effort`.
+2. Rank descending. The score is the artefact, not the answer — read the top 5 with a critical eye.
+3. Anti-pattern: treating RICE rank as a contract. It is a structured argument, not a verdict.
+### Step 6: Audit the bottom
+1. Look at the bottom quartile. If a strategic must-have lives there, the model has a calibration error — usually Reach or Impact.
+2. Look at the top item. If it is obviously absurd (e.g. one ad-hoc admin tool above a strategic platform play), the input scoring is uncalibrated.
+## Gotcha
+- Reach in percentages hides "this feature affects 100% of … 12 users."
+- Impact inflation: every PM thinks every feature is a 2 or 3. Force at least 30% of items to score 0.5 or below.
+- Confidence is the only multiplier that punishes uncertainty — do not let it default to 80 for everything.
+- Effort discrepancy between PM and engineering on the same row is itself the signal — investigate, do not average.
+## Do NOT
+- Do NOT rank fewer than 5 candidates with RICE — overhead exceeds value.
+- Do NOT mix strategic bets and BAU tickets in the same RICE table; their effort scales differ by 10×.
+- Do NOT ship a roadmap that is exactly the RICE-sorted top-N — you need at least one strategic outlier with a written rationale.
+## Related Skills
+**WHEN to use this**
+- Ranking is the actual question.
+- The team needs a shared, auditable scoring frame.
+**WHEN NOT to use this**
+- Decomposing an objective into KRs — route to [`okr-tree-modeling`](../okr-tree-modeling/SKILL.md).
+- Diagnosing why a funnel stage drops — route to [`funnel-analysis`](../funnel-analysis/SKILL.md).
+- Modelling whether an investment is worth its capital cost — route to [`dcf-modeling`](../dcf-modeling/SKILL.md).
+- CAC / LTV / payback questions — route to [`unit-economics-modeling`](../unit-economics-modeling/SKILL.md).
+## When the agent should load this
+- "Help me prioritize the backlog for Q3."
+- "RICE-score these features."
+- "Why is X above Y on the roadmap?"
+- "We have 30 ideas and 6 engineers — what ships?"
+- "Audit our roadmap for low-value work."
+## Output
+1. **`rice-table.md`** — markdown table: Item · Reach · Impact · Confidence · Effort · RICE · Owner · Notes. Sorted descending by RICE.
+2. **`calibration-notes.md`** — one paragraph per anchor (what "Impact = 2" means with a named past feature) plus a list of items with confidence < 50 marked for spike-first.
+3. **`top-5-critique.md`** — one paragraph per top-5 item: is the rank defensible, and what would change it.

package/.agent-src/skills/roadmap-management/SKILL.md CHANGED Viewed

@@ -117,11 +117,28 @@ Every roadmap follows this structure:
 ### Quality gates
-Every roadmap implicitly includes these gates (run after each step that changes code):
+Every roadmap implicitly includes the project's quality pipeline
+(static analysis, autofixes, tests). What's configurable is **when**
+the pipeline runs during `/roadmap execute`, controlled by
+`roadmap.quality_cadence` in `.agent-settings.yml`:
-- PHPStan must pass (detect command: artisan vs composer, see `rules/docker-commands.md`)
-- Rector: run with fix flag, verify no new PHPStan errors
-- Tests: run affected tests
+| Cadence | Pipeline runs | Trade-off |
+|---|---|---|
+| `end_of_roadmap` (default) | Once before archiving | Fastest, fewest tokens; errors compound across phases |
+| `per_phase` | After every completed phase + final | Balanced; catches drift at phase boundaries |
+| `per_step` | After every completed step + final | Legacy verbose; highest token cost |
+The default is `end_of_roadmap` because most steps are checkbox-only
+content edits and a final pipeline run is the cheapest way to satisfy
+`verify-before-complete`. Switch to `per_phase` for risky migrations or
+unfamiliar codebases.
+**Always-on, regardless of cadence:**
+- Step checkboxes flip `[ ] → [x]` and the dashboard regenerates **same
+  response** (enforced by `roadmap-progress-sync`).
+- Before any "roadmap complete" claim or archival, the pipeline runs
+  fresh (enforced by `verify-before-complete`).
 ### Step granularity
@@ -149,6 +166,14 @@ Every roadmap implicitly includes these gates (run after each step that changes
    the roadmap text. If the user declines, do **not** re-propose during
    `roadmap-execute`. Decline = silence. See [`scope-control`](../../rules/scope-control.md#decline--silence--no-re-asking-on-the-same-task).
 5. Save with a kebab-case filename (e.g. `optimize-webhook-jobs.md`).
+   **Before writing**, scan the entire roadmap namespace for a
+   collision — active, `archive/`, `skipped/`, and nested subdirs —
+   with `find agents/roadmaps -type f -iname "<name>.md"`. If any
+   hit comes back, stop and ask the user to rename, open the
+   existing file, or abort. Never silently overwrite an archived
+   or skipped roadmap. Detailed prompt in
+   [`commands/roadmap/create.md`](../../commands/roadmap/create.md)
+   step 6.
 6. Regenerate the dashboard so the new roadmap is included.
 ### Executing a roadmap