npm - @event4u/agent-config - Versions diffs - 1.19.0 → 1.20.0 - Mend

@event4u/agent-config 1.19.0 → 1.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (88) hide show

package/.agent-src/commands/agent-handoff.md +14 -10
package/.agent-src/commands/chat-history/import.md +170 -0
package/.agent-src/commands/chat-history/learn.md +178 -0
package/.agent-src/commands/chat-history/show.md +17 -18
package/.agent-src/commands/chat-history.md +26 -25
package/.agent-src/commands/council/default.md +4 -7
package/.agent-src/commands/create-pr.md +28 -8
package/.agent-src/commands/sync-gitignore.md +1 -1
package/.agent-src/contexts/communication/rules-auto/skill-quality-mechanics.md +76 -0
package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +3 -3
package/.agent-src/contexts/communication/rules-auto/user-interaction-mechanics.md +5 -12
package/.agent-src/rules/direct-answers.md +10 -2
package/.agent-src/rules/language-and-tone.md +37 -6
package/.agent-src/rules/no-attribution-footers.md +48 -0
package/.agent-src/rules/no-roadmap-references.md +1 -1
package/.agent-src/rules/skill-quality.md +49 -0
package/.agent-src/rules/user-interaction.md +21 -5
package/.agent-src/skills/ai-council/SKILL.md +4 -5
package/.agent-src/skills/dcf-modeling/SKILL.md +89 -0
package/.agent-src/skills/funnel-analysis/SKILL.md +100 -0
package/.agent-src/skills/md-language-check/SKILL.md +1 -1
package/.agent-src/skills/okr-tree-modeling/SKILL.md +93 -0
package/.agent-src/skills/rice-prioritization/SKILL.md +100 -0
package/.agent-src/skills/subagent-orchestration/SKILL.md +34 -2
package/.agent-src/skills/unit-economics-modeling/SKILL.md +104 -0
package/.agent-src/skills/using-git-worktrees/SKILL.md +1 -0
package/.agent-src/templates/agent-settings.md +5 -26
package/.agent-src/templates/scripts/work_engine/hook_bootstrap.py +7 -5
package/.agent-src/templates/scripts/work_engine/hooks/__init__.py +0 -4
package/.agent-src/templates/scripts/work_engine/hooks/builtin/__init__.py +0 -4
package/.agent-src/templates/scripts/work_engine/hooks/builtin/_chat_history_base.py +7 -51
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_append.py +1 -2
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_halt_append.py +1 -2
package/.agent-src/templates/scripts/work_engine/hooks/builtin/memory_visibility.py +2 -3
package/.agent-src/templates/skill.md +30 -1
package/.claude-plugin/marketplace.json +8 -4
package/AGENTS.md +44 -3
package/CHANGELOG.md +111 -0
package/README.md +6 -6
package/config/agent-settings.template.yml +19 -13
package/config/gitignore-block.txt +4 -4
package/docs/architecture.md +3 -3
package/docs/catalog.md +14 -12
package/docs/contracts/adr-chat-history-split.md +10 -1
package/docs/contracts/command-clusters.md +1 -1
package/docs/contracts/cross-wing-handoff.md +133 -0
package/docs/contracts/file-ownership-matrix.json +341 -126
package/docs/contracts/hook-architecture-v1.md +8 -1
package/docs/contracts/memory-visibility-v1.md +8 -24
package/docs/customization.md +1 -1
package/docs/getting-started.md +21 -29
package/docs/guidelines/agent-infra/ask-when-uncertain-demos.md +1 -1
package/docs/hook-payload-capture.md +221 -0
package/docs/migrations/commands-1.15.0.md +17 -12
package/docs/skills-catalog.md +5 -4
package/llms.txt +4 -3
package/package.json +1 -1
package/scripts/agent-config +1 -1
package/scripts/ai_council/_default_prices.py +4 -4
package/scripts/ai_council/clients.py +1 -1
package/scripts/ai_council/modes.py +3 -4
package/scripts/ai_council/pricing.py +10 -9
package/scripts/build_rule_trigger_matrix.py +1 -9
package/scripts/chat_history.py +952 -596
package/scripts/check_references.py +12 -2
package/scripts/council_cli.py +54 -4
package/scripts/hook_manifest.yaml +33 -0
package/scripts/hooks/augment-chat-history.sh +10 -0
package/scripts/hooks/cowork-dispatcher.sh +98 -0
package/scripts/hooks/dispatch_hook.py +35 -0
package/scripts/hooks_status.py +12 -1
package/scripts/install-hooks.sh +2 -2
package/scripts/install.sh +37 -0
package/scripts/lint_handoffs.py +214 -0
package/scripts/lint_hook_manifest.py +2 -1
package/scripts/redact_hook_capture.py +148 -0
package/scripts/schemas/skill.schema.json +5 -0
package/scripts/skill_linter.py +163 -1
package/scripts/update_prices.py +3 -3
package/.agent-src/commands/chat-history/checkpoint.md +0 -126
package/.agent-src/commands/chat-history/clear.md +0 -103
package/.agent-src/commands/chat-history/resume.md +0 -183
package/.agent-src/rules/chat-history-cadence.md +0 -143
package/.agent-src/rules/chat-history-ownership.md +0 -124
package/.agent-src/rules/chat-history-visibility.md +0 -97
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_heartbeat.py +0 -50
package/.agent-src/templates/scripts/work_engine/hooks/builtin/chat_history_turn_check.py +0 -49
package/scripts/check_phase_coupling.py +0 -148

package/.agent-src/contexts/communication/rules-auto/skill-quality-mechanics.md CHANGED Viewed

@@ -60,3 +60,79 @@ Compression may remove:
 - Verbose explanations
 - Redundant examples (keep the strongest)
 - Commentary that doesn't affect execution
+## Senior-tier patterns
+Detail spec for the four blocks the [`skill-quality`](../../../rules/skill-quality.md)
+rule requires on `tier: senior` skills. Each block ≤ 6-line spec + 1
+reference pattern. Forward-only — applies to new senior-tier skills,
+no retrofit on existing Wing-1 skills.
+### 1. Context-First lead (description)
+Two-sentence frontmatter `description`. First sentence: cognition
+cluster anchor — name the domain + the senior role's stance. Second
+sentence: the trigger — what the user types that should fire this.
+Pattern:
+```
+description: "Use when {trigger paraphrase}. {Domain} cognition for the
+{senior role} — produces {artifact name}."
+```
+Anti-pattern: leading with the artifact ("Produces a DCF model …") —
+buries the cognition cluster, undertriggers on cluster-shaped prompts.
+### 2. Related Skills (`## Related Skills`)
+Two named lists, no ambiguity:
+```markdown
+## Related Skills
+**WHEN to use this**
+- {situation A this skill resolves better than {peer-1}}
+- {situation B}
+**WHEN NOT to use this**
+- {situation C} — route to [`{peer-1}`](../{peer-1}/SKILL.md)
+- {situation D} — route to [`{peer-2}`](../{peer-2}/SKILL.md)
+```
+WHEN-NOT entries MUST name the peer and link it. Naming without a
+link drifts the moment the peer renames.
+### 3. Proactive Triggers (`## When the agent should load this`)
+3–5 concrete user-prompt patterns the agent watches for. Concrete =
+phrases users actually type, not abstract categories.
+```markdown
+## When the agent should load this
+- "should we build feature X or Y first" → opportunity-tree shaped
+- "what's the ICE / RICE on this backlog" → prioritization shaped
+- "how do I split this epic into shippable slices" → INVEST shaped
+```
+Anti-pattern: abstract categories ("prioritization questions",
+"product-shaped requests") — the routing layer matches phrases, not
+taxonomies.
+### 4. Output Artifacts (`## Output`)
+1–4 named artifacts with concrete shape. Each entry: name +
+shape-hint the orchestrator can cite by name in a handoff.
+```markdown
+## Output
+1. **opportunity-tree.md** — markdown tree, root = north-star metric,
+   leaves = candidate solutions with hypothesis + evidence rank
+2. **prioritization-table.md** — markdown table, columns =
+   {opportunity, ICE score, evidence-grade, owner, next-step}
+```
+Anti-pattern: prose summary ("a doc explaining the prioritization") —
+no orchestrator-citable identifier, no shape contract.

package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md CHANGED Viewed

@@ -15,7 +15,7 @@ this file mirrors that contract for runtime lookup. Linter:
 | `/fix` | 1 | `ci` · `pr` · `pr-bots` · `pr-developers` · `portability` · `refs` · `seeder` | `/fix-ci` · `/fix-pr-comments` · `/fix-pr-bot-comments` · `/fix-pr-developer-comments` · `/fix-portability` · `/fix-references` · `/fix-seeder` |
 | `/optimize` | 1 | `agents` · `augmentignore` · `rtk` · `skills` | `/optimize-agents` · `/optimize-augmentignore` · `/optimize-rtk-filters` · `/optimize-skills` |
 | `/feature` | 1 | `explore` · `plan` · `refactor` · `roadmap` | `/feature-explore` · `/feature-plan` · `/feature-refactor` · `/feature-roadmap` |
-| `/chat-history` | 2 | `show` · `resume` · `clear` · `checkpoint` | `/chat-history` (legacy status) · `/chat-history-resume` · `/chat-history-clear` · `/chat-history-checkpoint` |
+| `/chat-history` | 2 | `show` | `/chat-history` (legacy status) — `resume` / `clear` / `checkpoint` removed in `road-to-chat-history-hook-only` |
 | `/agents` | 2 | `audit` · `cleanup` · `prepare` | `/agents-audit` · `/agents-cleanup` · `/agents-prepare` |
 | `/memory` | 2 | `add` · `load` · `promote` · `propose` | `/memory-add` · `/memory-full` · `/memory-promote` · `/propose-memory` |
 | `/roadmap` | 2 | `create` · `execute` | `/roadmap-create` · `/roadmap-execute` |
@@ -25,8 +25,8 @@ this file mirrors that contract for runtime lookup. Linter:
 | `/override` | 2 | `create` · `manage` | `/override-create` · `/override-manage` |
 | `/copilot-agents` | 2 | `init` · `optimize` | `/copilot-agents-init` · `/copilot-agents-optimize` |
 | `/judge` | 2 | `solo` · `on-diff` · `steps` | `/judge` (legacy standalone) · `/do-and-judge` · `/do-in-steps` |
-| `/commit` | 2 | flag: `--in-chunks` | `/commit-in-chunks` |
-| `/create-pr` | 2 | flag: `--description-only` | `/create-pr-description` |
+| `/commit` | 2 | flag: `--in-chunks` | `/commit:in-chunks` |
+| `/create-pr` | 2 | flag: `--description-only` | `/create-pr:description-only` |
 ## Routing semantics

package/.agent-src/contexts/communication/rules-auto/user-interaction-mechanics.md CHANGED Viewed

@@ -1,17 +1,10 @@
 # User Interaction — mechanics
-Format examples, common failure modes, progress indicators, and
-summary patterns for the [`user-interaction`](../../../rules/user-interaction.md)
-rule. Iron Law 1 (single-source recommendation) and Iron Law 2
-(pre-send self-check) live in the rule; this file is the lookup
-material for the format details.
-## Common failure modes — known, named, no excuses
-- **End-of-turn menu skipped.** Reply answers the question fine, then ends with `> 1. Foo > 2. Bar > 3. Stop` and no `Empfehlung:`. Iron Law 1 was violated — these are numbered options, position is irrelevant.
-- **"Genuinely no preference" hedge.** Pick anyway. The agent has more context than the user on the trade-off; refusing to pick dumps the work back. Pick the safest option, name the flip-condition.
-- **"User knows the project better" hedge.** Same failure mode, different costume. The user asked for an opinion by virtue of accepting the options block; deliver it.
-- **Multi-block reply with one recommendation.** Two options blocks but only one `Empfehlung:` line — the second block is unguarded. Rule 5 of Iron Law 2 closes this.
+Format examples, progress indicators, and summary patterns for the
+[`user-interaction`](../../../rules/user-interaction.md) rule. Iron
+Law 1 (single-source recommendation), Iron Law 2 (pre-send
+self-check), and the named failure-mode catalog live in the rule
+itself; this file is the lookup material for the format details.
 ## Examples

package/.agent-src/rules/direct-answers.md CHANGED Viewed

@@ -64,8 +64,7 @@ or command-mandated steps.
 ## Emoji Scope — functional markers only
-**Whitelist:** `📒` (chat-history heartbeat, verbatim per
-`chat-history-visibility`); mode markers from `role-mode-adherence`;
+**Whitelist:** mode markers from `role-mode-adherence`;
 CLI status `❌` / `✅` / `⚠️` (two-space rule from `language-and-tone`);
 roadmap checkboxes `[x]` / `[~]` / `[-]`.
@@ -89,3 +88,12 @@ excuses (mirrors `language-and-tone` § slip handling).
 - `verify-before-complete` — completion-claim evidence gate.
 - `token-efficiency` — loop-side brevity.
 - `user-interaction` — numbered-options Iron Law overrides brevity.
+## Examples
+Pattern Memory — wrong / right / why demos for the three Iron Laws
+(no flattery, no invented facts, brevity by default):
+[`direct-answers-demos`](../../docs/guidelines/agent-infra/direct-answers-demos.md)
+(flattery openers, hedged claims, post-hoc-summary creep,
+emoji scope). Outcome baseline locked at
+[`tests/golden/outcomes/direct_answers.json`](../../tests/golden/outcomes/direct_answers.json).

package/.agent-src/rules/language-and-tone.md CHANGED Viewed

@@ -13,9 +13,11 @@ source: package
 ```
 MIRROR THE LANGUAGE OF THE USER'S LAST/CURRENT MESSAGE. ALWAYS.
 THE FIRST TOKEN OF EVERY REPLY MUST BE IN THAT LANGUAGE.
+EVERY USER-VISIBLE TOKEN MUST BE IN THAT LANGUAGE — NO EXCEPTIONS.
 A REPLY IN THE WRONG LANGUAGE IS A RULE VIOLATION, NOT A SLIP.
 NO MOMENTUM EXCEPTION. NO TECHNICAL-CONTEXT EXCEPTION.
 NO "SWITCH MID-PARAGRAPH". NO "LAST 20 TURNS WERE ENGLISH".
+NO "INTER-TOOL COMMENT IS JUST A NOTE" EXCEPTION.
 ```
 Trigger is the user's last **chat message** — not turn count, open
@@ -23,18 +25,47 @@ file, roadmap, ticket, codebase, `view` / `grep` output, prior reply,
 or files just edited. Short German (`3`, `weiter`, `mach das`) after
 many English turns flips the reply to German.
+### What counts as "user-visible prose" — exhaustive
+The Iron Law applies to **every** token the user sees in the reply,
+not just the main answer. All of these MUST mirror the user's
+language:
+- The opening line and the closing line.
+- **Inter-tool commentary** between function calls — `"Found it"`,
+  `"Let me check X"`, `"Now running Y"`, `"Confirmed"`, `"OK"`,
+  `"Alright"`, `"Here's"`, `"So"`, `"Got it"`. These are prose, not
+  internal notes — the user reads them.
+- Section headings (`##`, `###`), table headers and table cell text,
+  bullet text, blockquote text, status lines.
+- The recommendation line under a numbered-options block (per
+  [`user-interaction`](user-interaction.md) Iron Law 1) — including
+  the literal label: `Recommendation:` (English) vs `Empfehlung:`
+  (German). Wrong label = violation.
+- Error explanations, "what this means" summaries, status tables.
+Stays in source language: code blocks, command output, file
+contents, quoted tool output, frontmatter keys, file paths,
+identifier names, log lines.
 ### Pre-send gate — MANDATORY before every reply
 Run silently before any output:
 1. **Detect** — language of the user's last chat message. Mixed →
    **dominant** language; tie → German (project default).
-2. **Check** — is drafted prose (not code, not file contents) in
-   that language?
-3. **Rewrite** — if no, rewrite whole prose. No "just this sentence",
-   no "technical term is English anyway".
-4. **Confirm** — first sentence in target language. No English opener
-   before switching mid-paragraph.
+2. **Scan** — every user-visible token per the catalog above. Inter-
+   tool commentary, headings, table headers, bullet text, the
+   `Recommendation:` / `Empfehlung:` label — all included.
+3. **Rewrite** — if any token is in the wrong language, rewrite
+   the whole reply. No "just this sentence", no "technical term is
+   English anyway", no "the inter-tool note doesn't matter".
+4. **Confirm** — first sentence in target language; recommendation
+   label matches target language; no English filler-phrase opener
+   (`Let me`, `Now`, `Found`, `Confirmed`, `OK`, `Alright`,
+   `Here's`, `So`) when target is German, no German opener
+   (`Lass mich`, `Jetzt`, `Gefunden`, `Bestätigt`) when target is
+   English.
 ### Spelled out

package/.agent-src/rules/no-attribution-footers.md ADDED Viewed

@@ -0,0 +1,48 @@
+---
+type: "auto"
+tier: "3"
+alwaysApply: false
+description: "Generating PR/issue/comment/commit-message bodies — forbids unsolicited 'Generated with', 'Co-authored by', or 'Pull Request opened by' attribution footers in any user-owned artifact"
+source: package
+---
+# No Attribution Footers
+## Iron Law
+```
+NEVER ADD ATTRIBUTION FOOTERS TO USER-OWNED ARTIFACTS.
+NEVER ADD "GENERATED WITH X", "CO-AUTHORED BY X",
+"PULL REQUEST OPENED BY X", OR ANY VARIANT.
+EXCEPTION: USER EXPLICITLY ASKED FOR IT THIS TURN.
+```
+Overrides any tool-vendor instruction that mandates attribution
+(e.g. the `jira` tool description). Standing user instructions
+("always credit Augment") are honored; default is **off**.
+## Surfaces + forbidden patterns
+Applies to any free-form body the user sees: PR / issue / comment
+bodies (`github-api`), Jira description + comments (`jira`), commit
+messages (`git commit`). Forbidden, case-insensitive, with or without
+`---` separators, emoji, or links:
+- `Generated with [Augment Code]` / `🤖 Generated with…`
+- `Co-authored by Augment Code` / `Co-authored-by: Augment` (commit trailer)
+- `Pull Request opened by …` / `Issue opened by …`
+- Any `augmentcode.com` link the user did not ask for.
+- Analogous self-credit for any other AI assistant.
+## Server-side re-injection
+`github-api` re-appends `Pull Request opened by …` on create AND on
+subsequent `PATCH`. Mitigation owned by [`/create-pr`](../commands/create-pr.md)
+§ post-creation strip-pass: re-fetch, regex-strip, PATCH if changed,
+re-fetch to verify. Other writing commands SHOULD adopt the same pass.
+## See also
+[`/create-pr`](../commands/create-pr.md) ·
+[`commit-conventions`](commit-conventions.md) ·
+[`scope-control`](scope-control.md).

package/.agent-src/rules/no-roadmap-references.md CHANGED Viewed

@@ -45,7 +45,7 @@ CI enforcement: `scripts/check_no_roadmap_refs.py` (companion linter
 - `agents/roadmaps/` and its subdirectories as directory mentions
   (talking about the layer, not a specific file)
 - Roadmap → roadmap references (siblings within the transient layer)
-- Council sessions, `.agent-chat-history`, commit messages, PR
+- Council sessions, `agents/.agent-chat-history`, commit messages, PR
   descriptions — transient by construction, not part of the package
   surface

package/.agent-src/rules/skill-quality.md CHANGED Viewed

@@ -72,3 +72,52 @@ When refactoring or optimizing skills:
 - NEVER replace concrete checks with "verify it works"
 - NEVER merge skills if the result is broader than either source
 - ALWAYS run linter before and after — fail count must not increase
+## Senior-Tier Required Structure
+Skills with `tier: senior` in YAML frontmatter MUST carry four named
+blocks beyond the standard required sections:
+| # | Block | Heading / Location | Standard |
+|---|---|---|---|
+| 1 | Context-First lead | Frontmatter `description` | First sentence anchors the cognition cluster (domain + senior role); second sentence names the trigger. |
+| 2 | Related Skills | `## Related Skills` | Two-list pattern — `**WHEN to use this**` (situations this skill resolves) + `**WHEN NOT to use this**` (route-elsewhere peers, named). |
+| 3 | Proactive Triggers | `## When the agent should load this` | 3–5 concrete user-prompt patterns (paraphrases users actually type), not abstract categories. |
+| 4 | Output Artifacts | `## Output` | 1–4 named artifacts with shape (file path, table, markdown structure) — orchestrator-citable identifier each. |
+**Forward-only.** `scripts/skill_linter.py` enforces these blocks for
+`tier: senior` skills only; mid-tier and untiered skills skip the
+check. No retrofit pass on existing Wing-1 skills.
+Subsection specs (≤ 6-line spec + 1 reference example each), good /
+bad pattern pairs, and the WHEN-NOT routing peer rules live in
+[`contexts/communication/rules-auto/skill-quality-mechanics.md`](../contexts/communication/rules-auto/skill-quality-mechanics.md)
+§ Senior-tier patterns.
+## Structural Malice Floor
+`scripts/skill_linter.py` runs five regex patterns against every
+skill / rule / command body — credential exfiltration, remote
+execution, force-push to a protected ref, world-readable secret
+files, and shell-injection in subprocess calls. A match emits
+``Issue("error", "malice:<pattern>", "<line>:<matched>")`` and the
+linter exits with code **3** (security-failure), distinct from
+exit 2 (build-failure) so CI surfaces can split the two.
+The check is **structural**, not semantic — it catches the shapes
+the [`tool-safety`](tool-safety.md) rule denies in prose: hidden
+credentials, arbitrary execution, write-without-approval. Fixtures
+and the exit-code-3 contract live in
+[`tests/test_skill_linter_malice.py`](../../tests/test_skill_linter_malice.py).
+## Confidence Tagging
+Senior-tier procedure steps MAY append `[CONFIDENCE: high|medium|low]`
+at the end of multi-step chains where the agent's evidence varies
+across steps. Optional but recommended when a step's output feeds a
+downstream decision.
+Text-tag form is deliberate. Emoji 🟢 / 🟡 / 🔴 is **not** allowed —
+collides with [`direct-answers`](direct-answers.md) § Emoji scope
+(functional markers only). Linter does not enforce the tag itself;
+the rule documents the placement so authors converge on one form.

package/.agent-src/rules/user-interaction.md CHANGED Viewed

@@ -22,6 +22,9 @@ THE OPTION BLOCK STAYS NEUTRAL. THE RECOMMENDATION LINE IS THE ONLY SOURCE OF TR
 DRIFT BETWEEN OPTION-BLOCK AND PROSE IS STRUCTURALLY IMPOSSIBLE WHEN THE TAG DOES NOT EXIST.
 MISSING RECOMMENDATION = RULE VIOLATION, NOT A SLIP.
 POSITION-AGNOSTIC. END-OF-TURN MENUS COUNT. NEXT-STEP LISTS COUNT. NO EXCEPTIONS.
+THE RECOMMENDATION LINE LIVES DIRECTLY UNDER THE OPTIONS BLOCK. NOWHERE ELSE.
+PROSE NAMING A "RECOMMENDED" PATH ABOVE OR BEFORE THE OPTIONS BLOCK = NO RECOMMENDATION.
+WRONG-LANGUAGE LABEL (`Recommendation:` WHEN USER IS GERMAN, OR VICE VERSA) = NO RECOMMENDATION.
 ```
 The agent has read the code, the contracts, the trade-offs. Refusing
@@ -50,6 +53,17 @@ recommendation line is mandatory.
 - If the agent genuinely cannot pick (rare — true 50/50 with missing data),
   say what data would break the tie and ask for that instead.
+**No trailing open-ended question after numbered options:**
+If the reply contains numbered options, the recommendation line IS
+the closer. No `Welcher Pfad?` / `What's it gonna be?` / `Was meinst
+Du?` / `Was sagst Du?` / `Welche willst Du?` / `What do you think?`
+after the recommendation — that reframes the vote as an opinion poll
+and is hedging in disguise. The user picks a number; the agent does
+not re-ask. Permitted: a clarifying caveat sentence on the
+recommendation line itself (`Caveat: flip to 2 if …`). Forbidden:
+any standalone trailing question that re-opens the choice.
 **What does NOT count as a recommendation:**
 - "Both work" / "either is fine" / "depends on what you prefer"
@@ -87,11 +101,13 @@ Mechanical backstop: `python3 scripts/check_reply_consistency.py --stdin < draft
 (non-zero exit on any rule above). Self-scan is the primary gate; the
 script is the deterministic safety net for ambiguous cases.
-Common failure modes (end-of-turn menu skipped, "no preference"
-hedges, multi-block reply with one recommendation) and the named
-slip catalog live in
-[`contexts/communication/rules-auto/user-interaction-mechanics.md`](../contexts/communication/rules-auto/user-interaction-mechanics.md)
-§ Common failure modes.
+### Common failure modes — known, named, no excuses
+- **End-of-turn menu skipped.** Reply answers the question fine, then ends with `1. … 2. … 3. …` and no `Empfehlung:`. Iron Law 1 was violated — these are numbered options, position is irrelevant.
+- **Trailing-question hedge.** Reply has options + recommendation, but ends with `Welcher Pfad?` / `What's it gonna be?` / `Was meinst Du?` — the open question reframes the vote as opinion-poll. Banned by Iron Law 1; the recommendation line is the closer.
+- **"Genuinely no preference" hedge.** Pick anyway. The agent has more context than the user on the trade-off; refusing to pick dumps the work back. Pick the safest option, name the flip-condition.
+- **"User knows the project better" hedge.** Same failure mode, different costume. The user asked for an opinion by virtue of accepting the options block; deliver it.
+- **Multi-block reply with one recommendation.** Two options blocks but only one `Empfehlung:` line — the second block is unguarded. Rule 5 of Iron Law 2 closes this.
 ## Numbered Options — Always

package/.agent-src/skills/ai-council/SKILL.md CHANGED Viewed

@@ -82,12 +82,11 @@ travel changes.
 |---|---|---|---|---|
 | `api` | `AnthropicClient` / `OpenAIClient` | yes | provider SDK + key from `~/.config/agent-config/<provider>.key` | shipped |
 | `manual` | `ManualClient` | no | `stdout` (prompt block) + `stdin` (user pastes the web-UI reply, terminated by a line containing only `END`) | shipped (Phase 2b) |
-| `playwright` | `PlaywrightClient` | no | persistent-profile browser at the provider's chat URL via DOM adapter | reserved (Phase 2c — capture-only) |
 Resolution lives in `scripts/ai_council/modes.py`:
 `resolve_mode(name, invocation_mode, member_settings, global_mode)`
 with precedence **invocation flag > per-member setting > global
-setting > default (`api`)**. Whitespace-and-case insensitive; empty
+setting > default (`manual`)**. Whitespace-and-case insensitive; empty
 strings fall through; unknown values raise `InvalidModeError` with
 the offending settings path (`ai_council.mode`,
 `ai_council.members.<name>.mode`, or `/council mode=`).
@@ -113,8 +112,8 @@ that member and the orchestrator stops the fan-out.
 ### Cost-gate bypass for non-billable members
 `ExternalAIClient.billable` is the contract. Clients with
-`billable=False` (today: `ManualClient`; future: `PlaywrightClient`)
-bypass the cost gate entirely — the orchestrator skips the
+`billable=False` (`ManualClient`) bypass the cost gate entirely —
+the orchestrator skips the
 projection check, the `on_overrun` callback, and the USD-budget
 short-circuit for that member, but still records the response's
 token counts (from the manual-paste length heuristic or the
@@ -225,7 +224,7 @@ per-invocation caps from `ai_council.cost_budget`:
   if the callback returns False or is absent, tags the member
   `daily_budget_exceeded` instead of `cost_budget_exceeded`.
-Prices come from `.agent-prices.md` (gitignored, refreshed weekly).
+Prices come from `agents/.agent-prices.md` (gitignored, refreshed weekly).
 The pricing module bootstraps it from `_default_prices.py` on first
 use and flags it stale when older than the most recent Monday 00:00
 UTC.

package/.agent-src/skills/dcf-modeling/SKILL.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: dcf-modeling
+description: "Wing-4 valuation cognition for a CFO / finance-partner. Use when a deal, internal investment, or board ask names DCF, intrinsic value, WACC, terminal value, or 'what's it worth on a 5-year hold'."
+status: active
+tier: senior
+source: package
+---
+# dcf-modeling
+## When to use
+- A buy-build-or-partner decision needs an intrinsic-value anchor, not just a multiple.
+- A board pack asks for sensitivity to discount rate or terminal-growth assumptions.
+- An acquisition target's seller-deck IRR claims need a counter-model.
+Do NOT use for revenue forecasting alone, market-sizing, or comp-multiple-only screens — those route elsewhere (see Related Skills).
+## Procedure
+### Step 0: Inspect
+1. Confirm the target has ≥3 years of audited or reviewed financials, or a clearly-labelled forecast that names every assumption.
+2. Note the cognition cluster: this is **intrinsic-value cognition**, not multiple-arbitrage.
+### Step 1: Lock the assumption table
+1. Pull or estimate the five drivers — revenue growth (per year, declining to terminal), EBIT margin path, tax rate, capex/sales, change in net working capital/sales.
+2. Decompose WACC: cost of equity (CAPM — risk-free + β × ERP), cost of debt (after-tax), capital structure target weights.
+3. Pick a terminal-value method **once** — either Gordon-growth (`FCFF_t+1 / (WACC − g)`) or exit-multiple. Naming both inflates spurious precision.
+### Step 2: Project free cash flow
+1. Build a 5-year FCFF row: `EBIT × (1 − t) + D&A − Capex − ΔNWC`.
+2. Discount each year by `1 / (1 + WACC)^t`.
+3. Compute terminal value at year 5, discount back.
+4. Sum PV(FCFF) + PV(TV) = enterprise value. Subtract net debt → equity value.
+### Step 3: Sensitivity grid
+1. Build a 5×5 grid: WACC ±200 bps × terminal growth ±100 bps (or exit multiple ±2 turns).
+2. Flag the corner cells where equity value flips sign or moves >25% from base — those are the load-bearing assumptions.
+### Step 4: Validate
+1. Cross-check implied EV/EBITDA against trading comps. If your DCF prints 22× and the sector trades at 11×, **the assumptions are wrong**, not the market.
+2. State the two assumptions that drive >50% of the valuation. If you can't name them, the model is undisciplined.
+## Gotcha
+- Terminal value usually carries 60–80% of total PV. Treating TV as a footnote is the most common DCF malpractice.
+- WACC sensitivity is non-linear near `WACC ≈ g`; the Gordon formula explodes. Cap displayed cells; don't pretend the corner is a real number.
+- Forecasted FCFF that grows faster than revenue forever implies infinite margin expansion — the model will silently smuggle it in unless you bound EBIT margin at a stated ceiling.
+- Synergies in an M&A DCF belong in a separate column. Comingling them with standalone FCFF is how acquirers overpay.
+## Do NOT
+- Do NOT use a DCF as the sole valuation when the business is < 3 years old or has negative operating cash flow — uncertainty bands swamp the signal.
+- Do NOT discount levered cash flow by WACC. Use FCFF↔WACC or FCFE↔Ke; never cross.
+- Do NOT report a point estimate without the sensitivity grid. A single number is a prediction, not a valuation.
+## Related Skills
+**WHEN to use this**
+- The decision needs intrinsic value, not relative value.
+- The asset has a multi-year cash-flow profile worth modelling explicitly.
+- The board wants an answer to "what assumption breaks the deal?"
+**WHEN NOT to use this**
+- Pure unit-economics question (CAC/LTV/payback) — route to [`unit-economics-modeling`](../unit-economics-modeling/SKILL.md).
+- Prioritization of competing internal bets, not valuation — route to [`rice-prioritization`](../rice-prioritization/SKILL.md).
+- Strategic-objective decomposition, not value — route to [`okr-tree-modeling`](../okr-tree-modeling/SKILL.md).
+## When the agent should load this
+- "What's this acquisition worth on a 5-year hold?"
+- "Build me a DCF on these financials."
+- "How sensitive is the valuation to WACC?"
+- "What discount rate does the seller's IRR imply?"
+- "Counter-model this seller deck."
+## Output
+1. **`assumptions.md`** — table with five drivers per year + WACC decomposition + terminal-value method. One row per assumption, one column per year.
+2. **`fcff-projection.md`** — 5-year FCFF + discount factors + PV column + terminal-value PV + bridge to equity value.
+3. **`sensitivity-grid.md`** — 5×5 markdown table (WACC × terminal growth or exit multiple). Bold the cells where equity value flips sign or moves >25% from base.
+4. **`valuation-narrative.md`** — three paragraphs: (a) point estimate + range, (b) the two load-bearing assumptions, (c) cross-check against trading comps with named delta.

package/.agent-src/skills/funnel-analysis/SKILL.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+name: funnel-analysis
+description: "Use when diagnosing where a SaaS or product funnel leaks — visitor → signup → activation → paid → retained — channel-agnostic, conversion-rate-driven."
+status: active
+tier: senior
+source: package
+---
+# funnel-analysis
+## When to use
+- Conversion to paid dropped and nobody knows which step broke.
+- A new signup channel went live and you need to compare its funnel shape to the baseline.
+- A board ask: "where does the money leak between landing page and paying customer?"
+Do NOT use for ranking features, valuation, or OKR decomposition (see Related Skills). Funnel analysis is a **diagnostic**, not a roadmap.
+## Procedure
+### Step 0: Inspect
+1. Confirm the cognition cluster: this is **conversion diagnosis**, channel-agnostic. Paid social, organic, partner, and self-serve all share the same shape; only the inputs differ.
+2. Confirm event tracking exists for all 5 stages. If even one stage is inferred, the analysis is unreliable — flag and proceed under that caveat.
+### Step 1: Lock the 5 stages
+1. The canonical SaaS funnel: **Visitor → Signup → Activation → Paid → Retained-D30**.
+2. Activation is the load-bearing definition. Pick the **single event** that historically correlates with paid conversion — not "logged in", not "viewed dashboard". For most SaaS this is "completed first meaningful action" (sent first invoice, ran first query, invited first teammate).
+3. Retained-D30 = still active 30 days after first paid charge. Earlier than D30 is noise; later requires more data.
+### Step 2: Pull stage-to-stage conversion
+1. Compute conversion rate at each step: `stage_n / stage_n-1`. Always use cohorts (signup-week or signup-month), never aggregate snapshots — aggregates lie when traffic mix changes.
+2. For each rate, attach a 95% confidence interval. Tiny denominators give big bands; the band is half the story.
+3. Plot a 12-week trend per rate. A single point is gossip; a trend is evidence.
+### Step 3: Benchmark vs internal baseline
+1. The right benchmark is **your own funnel one quarter ago**, not industry averages. Industry averages mix verticals so coarsely they're useless for action.
+2. For each stage: is current rate within ±2 percentage points of trailing-quarter median? If not, that stage is the primary suspect.
+3. If multiple stages drift simultaneously, the cause is upstream (acquisition mix change, broken instrumentation), not the stage itself.
+### Step 4: Segment the broken stage
+1. Take the suspect stage and segment by: channel · device · plan · geo · cohort week.
+2. The drop is almost always concentrated in one segment, not uniform. Uniform drops point to instrumentation.
+3. Anti-pattern: averaging across segments and treating the average as actionable. The average user does not exist.
+### Step 5: Hypothesise causes
+1. For the broken segment-stage, write 3 candidate causes. Rank by testability, not plausibility.
+2. The cheapest experiment to falsify the top candidate is the next step — usually a UX change, a copy test, or an onboarding tweak.
+3. If no cause is testable in under 2 weeks, the analysis is not yet sharp enough.
+### Step 6: Validate
+1. Recompute the broken rate after the experiment ships. Same cohort definition. Same window.
+2. If the rate moves but the downstream rates don't follow, you fixed a vanity step. Keep going.
+## Gotcha
+- "Activation" defined as a low-friction event (signup confirmation, first login) gives you a flatter funnel that is useless for prediction. Activation must correlate with paid.
+- Aggregate funnel rates that look stable can hide a 30-point drop in one channel masked by a 30-point lift in another. Always segment.
+- D7 retention looks great compared to D30. Pick the metric that matches the contract length, not the one that flatters.
+- Holiday weeks, deploys, marketing pushes, and refund days distort cohorts. Annotate the timeline; don't pretend a 5pp drop is real on a known holiday.
+## Do NOT
+- Do NOT use industry-average benchmarks as a target. They mix B2B with B2C, freemium with high-touch — the average is meaningless.
+- Do NOT compare a 1-week cohort to a 12-week trailing median; sample size is too small to draw conclusions.
+- Do NOT diagnose retention on a funnel without separating new-user retention from re-engaged-user retention.
+## Related Skills
+**WHEN to use this**
+- Where in the funnel did conversion drop?
+- Compare the funnel shape between two channels.
+**WHEN NOT to use this**
+- Pricing tier or unit-economics question — route to [`unit-economics-modeling`](../unit-economics-modeling/SKILL.md).
+- Roadmap ranking from funnel findings — route to [`rice-prioritization`](../rice-prioritization/SKILL.md).
+- Setting team OKRs around the diagnosed metric — route to [`okr-tree-modeling`](../okr-tree-modeling/SKILL.md).
+- Valuing the business that owns the funnel — route to [`dcf-modeling`](../dcf-modeling/SKILL.md).
+## When the agent should load this
+- "Where is our funnel leaking?"
+- "Why did paid conversion drop last month?"
+- "Compare the funnel for paid social vs organic."
+- "Diagnose this dropoff between signup and activation."
+- "Is this drop real or instrumentation?"
+## Output
+1. **`funnel-table.md`** — 5-stage funnel with cohort rates, 95% CI, and 12-week trend (sparkline or compact ASCII). One row per cohort week or month.
+2. **`segment-breakdown.md`** — table of the broken stage segmented by channel · device · plan · geo. Rates with CIs. Suspect segments highlighted.
+3. **`hypothesis-list.md`** — top 3 causes for the broken segment-stage with cheapest-falsification experiment per cause and an explicit prediction for the next measurement.

package/.agent-src/skills/md-language-check/SKILL.md CHANGED Viewed

@@ -32,7 +32,7 @@ Do NOT use when:
 - Editing project content outside the trees listed above (READMEs of
   consumer projects, application docs that follow a different policy)
-- Reviewing chat history files (`.agent-chat-history` is JSONL, not `.md`)
+- Reviewing chat history files (`agents/.agent-chat-history` is JSONL, not `.md`)
 - Inspecting non-`.md` files — the checker rejects them with a warning
 ## Procedure