npm - @event4u/agent-config - Versions diffs - 5.7.0 → 5.9.0 - Mend

@event4u/agent-config 5.7.0 → 5.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (164) hide show

package/.agent-src/commands/agent-handoff.md +1 -1
package/.agent-src/commands/agent-status.md +1 -1
package/.agent-src/commands/agents/audit.md +1 -1
package/.agent-src/commands/agents/init.md +1 -1
package/.agent-src/commands/agents/user/accept.md +3 -3
package/.agent-src/commands/agents/user/init.md +4 -4
package/.agent-src/commands/agents/user/show.md +3 -3
package/.agent-src/commands/agents/user/update.md +3 -3
package/.agent-src/commands/agents/user.md +1 -1
package/.agent-src/commands/agents.md +1 -1
package/.agent-src/commands/analytics/prune.md +1 -1
package/.agent-src/commands/analytics/show.md +1 -1
package/.agent-src/commands/analytics.md +1 -1
package/.agent-src/commands/bug-fix.md +1 -1
package/.agent-src/commands/challenge-me.md +1 -1
package/.agent-src/commands/chat-history/import.md +1 -1
package/.agent-src/commands/chat-history/learn.md +1 -1
package/.agent-src/commands/chat-history/show.md +1 -1
package/.agent-src/commands/chat-history.md +1 -1
package/.agent-src/commands/check-current-md.md +1 -1
package/.agent-src/commands/condense.md +1 -1
package/.agent-src/commands/context.md +1 -1
package/.agent-src/commands/cost-report.md +1 -1
package/.agent-src/commands/council.md +3 -3
package/.agent-src/commands/create-pr/description-only.md +1 -1
package/.agent-src/commands/create-pr.md +1 -1
package/.agent-src/commands/e2e-heal.md +1 -1
package/.agent-src/commands/e2e-plan.md +1 -1
package/.agent-src/commands/feature.md +1 -1
package/.agent-src/commands/fix/ci.md +1 -1
package/.agent-src/commands/fix/portability.md +1 -1
package/.agent-src/commands/fix/pr-bot-comments.md +1 -1
package/.agent-src/commands/fix/pr-comments.md +1 -1
package/.agent-src/commands/fix/pr-developer-comments.md +1 -1
package/.agent-src/commands/fix/refs.md +1 -1
package/.agent-src/commands/fix/seeder.md +1 -1
package/.agent-src/commands/fix.md +1 -1
package/.agent-src/commands/judge.md +1 -1
package/.agent-src/commands/knowledge/cross-repo.md +1 -1
package/.agent-src/commands/knowledge/forget.md +1 -1
package/.agent-src/commands/knowledge/ingest.md +1 -1
package/.agent-src/commands/knowledge/list.md +1 -1
package/.agent-src/commands/knowledge.md +1 -1
package/.agent-src/commands/memory/add.md +1 -1
package/.agent-src/commands/memory/learn-low-impact.md +1 -1
package/.agent-src/commands/memory/load.md +1 -1
package/.agent-src/commands/memory/mine-session.md +1 -1
package/.agent-src/commands/memory/promote.md +1 -1
package/.agent-src/commands/memory/propose.md +1 -1
package/.agent-src/commands/memory.md +1 -1
package/.agent-src/commands/mode.md +1 -1
package/.agent-src/commands/optimize/agents-dir.md +1 -1
package/.agent-src/commands/optimize/augmentignore.md +1 -1
package/.agent-src/commands/optimize/rtk.md +1 -1
package/.agent-src/commands/optimize/skills.md +1 -1
package/.agent-src/commands/optimize.md +1 -1
package/.agent-src/commands/orchestrate.md +1 -1
package/.agent-src/commands/override/create.md +1 -1
package/.agent-src/commands/override/manage.md +1 -1
package/.agent-src/commands/override.md +1 -1
package/.agent-src/commands/package-reset.md +1 -1
package/.agent-src/commands/prediction-pool.md +31 -12
package/.agent-src/commands/profile/activate.md +81 -0
package/.agent-src/commands/profile/deactivate.md +68 -0
package/.agent-src/commands/profile/show.md +70 -0
package/.agent-src/commands/profile.md +68 -0
package/.agent-src/commands/project-health.md +1 -1
package/.agent-src/commands/quality-fix.md +1 -1
package/.agent-src/commands/roadmap/process-full.md +1 -1
package/.agent-src/commands/roadmap/process-phase.md +1 -1
package/.agent-src/commands/roadmap/process-step.md +1 -1
package/.agent-src/commands/roadmap.md +1 -1
package/.agent-src/commands/set-cost-profile.md +1 -1
package/.agent-src/commands/skill/preview.md +3 -3
package/.agent-src/commands/skill.md +1 -1
package/.agent-src/commands/skills/discover.md +1 -1
package/.agent-src/commands/skills.md +1 -1
package/.agent-src/commands/sync-agent-settings.md +1 -1
package/.agent-src/commands/sync-gitignore/fix.md +1 -1
package/.agent-src/commands/sync-gitignore.md +1 -1
package/.agent-src/commands/update-form-request-messages.md +1 -1
package/.agent-src/skills/check-refs/SKILL.md +1 -1
package/.agent-src/skills/finishing-a-development-branch/SKILL.md +1 -1
package/.agent-src/skills/git-workflow/SKILL.md +1 -1
package/.agent-src/skills/jira-integration/SKILL.md +1 -1
package/.agent-src/skills/markitdown/SKILL.md +1 -1
package/.agent-src/skills/prediction-pool-optimizer/SKILL.md +195 -77
package/.agent-src/skills/prediction-pool-optimizer/evals/triggers.json +3 -1
package/.agent-src/skills/prediction-pool-optimizer/reference/ev-fixtures.md +111 -16
package/.agent-src/skills/prediction-pool-optimizer/reference/odds-and-bonus.md +109 -0
package/.agent-src/skills/rtk-output-filtering/SKILL.md +1 -1
package/.agent-src/skills/script-writing/SKILL.md +1 -1
package/.agent-src/skills/token-optimizer/SKILL.md +1 -1
package/.agent-src/skills/using-git-worktrees/SKILL.md +1 -1
package/.agent-src/templates/agents/agent-project-settings.example.yml +1 -1
package/.agent-src/templates/scripts/work_engine/_lib/agent_settings.py +52 -5
package/.claude-plugin/marketplace.json +370 -366
package/CHANGELOG.md +77 -0
package/README.md +2 -2
package/config/discovery/session-profiles.yml +37 -0
package/dist/discovery/deprecation-report.md +1 -1
package/dist/discovery/discovery-manifest.json +183 -95
package/dist/discovery/discovery-manifest.json.sha256 +1 -1
package/dist/discovery/discovery-manifest.summary.md +3 -3
package/dist/discovery/orphan-report.md +1 -1
package/dist/discovery/packs.json +9 -5
package/dist/discovery/trust-report.md +2 -2
package/dist/discovery/workspaces.json +8 -4
package/dist/mcp/registry-manifest.json +3 -3
package/docs/architecture.md +1 -1
package/docs/catalog.md +7 -3
package/docs/contracts/command-clusters.md +2 -0
package/docs/contracts/session-profile-overlay.md +120 -0
package/docs/customization.md +26 -0
package/docs/decisions/ADR-010-profile-pack-preset-boundary.md +36 -0
package/docs/decisions/ADR-038-canonical-settings-path.md +66 -0
package/docs/decisions/ADR-039-claude-skills-untracked.md +139 -0
package/docs/decisions/INDEX.md +2 -0
package/docs/development.md +12 -0
package/docs/getting-started.md +1 -1
package/docs/guidelines/agent-infra/layered-settings.md +8 -2
package/docs/skills-catalog.md +5 -1
package/llms.txt +4 -0
package/package.json +1 -1
package/scripts/__pycache__/validate_frontmatter.cpython-312.pyc +0 -0
package/scripts/_cli/cmd_doctor.py +180 -16
package/scripts/_cli/cmd_versions.py +2 -2
package/scripts/_lib/__pycache__/__init__.cpython-312.pyc +0 -0
package/scripts/_lib/__pycache__/agent_src.cpython-312.pyc +0 -0
package/scripts/_lib/agent_settings.py +52 -5
package/scripts/_lib/agent_src.py +30 -0
package/scripts/ai_council/session.py +5 -1
package/scripts/audit_command_surface.py +7 -1
package/scripts/audit_initial_context.py +10 -2
package/scripts/check_gate_paths.py +117 -0
package/scripts/check_references.py +51 -2
package/scripts/check_release_published.py +145 -0
package/scripts/check_test_coverage_diff.py +180 -0
package/scripts/compile_router.py +5 -1
package/scripts/condense.py +79 -2
package/scripts/config/session_profiles.py +492 -0
package/scripts/council_cli.py +5 -1
package/scripts/hook_manifest.yaml +15 -7
package/scripts/hooks/dispatch_hook.py +8 -0
package/scripts/install-hooks.sh +2 -1
package/scripts/install.py +76 -5
package/scripts/inventory_abstraction_budget.py +6 -1
package/scripts/lint_agents_md.py +11 -4
package/scripts/lint_hook_concern_budget.py +5 -1
package/scripts/lint_marketplace.py +18 -7
package/scripts/lint_roadmap_ci_steps.py +5 -1
package/scripts/lint_roadmap_complexity.py +5 -1
package/scripts/mcp_server/prompts.py +5 -1
package/scripts/prediction-pool/pool_winsim.py +236 -0
package/scripts/prediction-pool/score_ev.py +188 -0
package/scripts/profile_staleness_hook.py +69 -0
package/scripts/release.py +54 -31
package/scripts/roadmap_progress_hook.py +56 -6
package/scripts/smoke_quickstart.py +3 -2
package/scripts/sync_agent_settings.py +8 -3
package/scripts/validate_agent_settings.py +5 -1
package/scripts/validate_decision_engine.py +5 -1
package/scripts/measure_roadmap_trajectory.py +0 -112
package/scripts/verify_roadmap_closure.py +0 -327

package/.agent-src/skills/prediction-pool-optimizer/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 model_tier: high
 name: prediction-pool-optimizer
-description: "Optimize prediction-pool tips (kicktipp etc.): pool rules + market odds → the expected-points-maximizing tip per match. Triggers 'optimize my pool tips', 'best kicktipp picks', 'predict'."
+description: "Optimize prediction-pool tips (kicktipp etc.): rules + multi-book consensus odds → expected-points-max answer for every question, scores AND bonus. Triggers 'optimize my pool tips', 'predict'."
 domain: product
 personas: []
 workspaces:
@@ -18,19 +18,22 @@ install:
 # prediction-pool-optimizer
-> Turn a prediction pool's **scoring rules** plus **market odds** into the
-> tip that maximizes **expected points** — not the most likely outcome.
-> Sport-agnostic core with per-sport probability blocks. Consumed by
-> [`/prediction-pool`](../../commands/prediction-pool.md). The optimization target is
-> the pool's score, so the chain is always **rules → odds → expected value
-> → participant field → tip**, never "who wins this match?".
+> Turn a prediction pool's **scoring rules** plus a **consensus of the major
+> bookmakers' odds** into the answer that maximizes **expected points** — not
+> the most likely outcome — for **every open question in the pool**: match
+> scores AND every bonus / award / special question (top scorer, group
+> winners, champion, most cards …). Sport-agnostic core with per-sport
+> probability blocks. Consumed by [`/prediction-pool`](../../commands/prediction-pool.md).
+> The optimization target is the pool's score, so the chain is always
+> **rules → odds → expected value → participant field → answer**, never
+> "who wins this match?".
 ## When to use
-Use when someone wants the best tips for a prediction / betting pool
+When someone wants the best tips for a prediction / betting pool
 (kicktipp-style company pools — football WM, basketball WM, …) and the
-target is **pool points**, not match truth. Triggered by the
-[`/prediction-pool`](../../commands/prediction-pool.md) command (Steps 3–5) or directly
+target is **pool points**, not match truth. Triggered by
+[`/prediction-pool`](../../commands/prediction-pool.md) (Steps 3–5) or directly
 when a user asks to optimize / maximize their pool picks.
 **The one idea that makes this skill correct:** the highest-probability
@@ -43,39 +46,69 @@ more. **Always optimize the pool's points, never the truth of the match.**
 - **Rules before tips.** Never produce a tip before the pool's scoring is
   parsed (Procedure step 1). Strategy is a function of the rules.
-- **Odds are the primary signal.** Bookmaker / market probabilities already
-  fold in form, squad, injuries, travel, climate. Use them as the
-  calibration base; only override with *current* information (confirmed
+- **Answer EVERY open question.** A pool has scores *and* bonus / award /
+  special questions ("which team supplies the top scorer?", "most yellow
+  cards?", "champion?"). Scorelines only, bonus questions blank = a **failed
+  run** — enumerate every open question in step 1, carry each to an answer
+  (steps 5–6). No silent skips.
+- **Odds are the primary signal — multi-book consensus, not one book.**
+  Bookmaker probabilities already fold in form, squad, injuries, travel,
+  climate. Build the base from a **consensus across the 5–10 biggest
+  publicly-viewable books** (step 2), de-vigged, **sharpness-weighted** —
+  never mirror a single portal. Override only with *current* info (confirmed
   lineups, late injuries, suspensions, manager change).
-- **No invented numbers.** Emit no probability you cannot derive from odds
-  or from **actually executed** code. Tournament/outright numbers come from
-  real outright odds **or** the executed Poisson helper — never a claimed
+- **No invented numbers.** Emit no probability you cannot derive from real
+  odds or **actually executed** code. Tournament/outright/award numbers come
+  from real markets **or** the executed Poisson helper — never a claimed
   "I ran 10,000 simulations".
-- **One-sentence justification** per tip. Short.
+- **Scorelines computed, not guessed.** EV-max tip per match from the executed
+  grid optimiser (`score_ev.py`, step 4a), never the eye. A 3:2 / 4:1 / 1:4 in
+  the output = signature of a skipped computation.
+- **One-sentence justification** per answer. Short.
 ## Procedure
-### 1. Parse the pool rules
+### 1. Parse the pool rules AND enumerate every open question
 From the pool's rule page, extract and document:
 - Points for **exact result** / **goal (point) difference** / **tendency**.
-- **Bonus questions** (champion, top scorer, group winners …).
-- **Joker / multiplier** rules.
-- **Quote / rarity** scoring (rare correct tips score more)? — flips the
-  whole strategy toward contrarian (step 4).
-- Special scorings, **deadlines**, and **strategy limits** (e.g. max N
-  identical tips).
+- **Every bonus / award / special question** (champion, top scorer, "team of
+  the top scorer", group winners, most cards, longest unbeaten,
+  will-there-be-a-red-card, over/under totals …). **Write them all down as an
+  explicit checklist** — this list is the run's contract; every entry must
+  reach an answer.
+- **Joker / multiplier** rules, per-question point weights.
+- **Quote / rarity** scoring (rare correct tips score more)? — flips strategy
+  toward contrarian (step 4).
+- Special scorings, **per-question deadlines**, **strategy limits** (e.g. max
+  N identical tips).
 - **The goal**: place well, or *win* a large pool? (changes variance — step 4.)
-### 2. Build the data base
-Primary: current bookmaker odds, aggregated market probabilities, model
-forecasts (e.g. Opta), Elo/SPI ratings. Secondary (only when it adds signal
-the odds have not yet absorbed): confirmed lineups, injuries, suspensions,
-manager change, recent form, home advantage, head-to-head, rest/travel,
-weather. De-vig the odds (remove the bookmaker margin) before treating them
-as probabilities.
+### 2. Build the data base — a consensus across the major books
+Primary signal: current bookmaker odds, **aggregated across the 5–10 biggest
+publicly-viewable books**, not a single portal:
+1. **Collect** odds for each market (1X2, exact-score, outrights, and each
+   special/award market a bonus question needs) from several books.
+   Odds-comparison aggregators (Oddschecker, Oddsportal / Betexplorer) show
+   many books at once; supplement with named books. Book list + weighting
+   recipe in [`reference/odds-and-bonus.md`](reference/odds-and-bonus.md).
+2. **De-vig each book** independently (remove its margin) → per-book implied
+   probabilities. Raw odds sum to >100%; never treat them as probabilities.
+3. **Aggregate with a healthy weighting**, not a blind average: weight
+   **sharp, low-margin books higher** (Pinnacle, Betfair Exchange),
+   recreational books lower; weighted mean or trimmed median so one outlier
+   book cannot swing the base. Result = the **consensus probability** — the
+   calibration base.
+4. **Single-book outlier = flag, not truth** — investigate *why* (priced-in
+   injury? stale line?) before moving off consensus. Cross-portal agreement is
+   signal; one portal disagreeing is a prompt to check, not to follow.
+Secondary (only when it adds signal the consensus has not absorbed): confirmed
+lineups, injuries, suspensions, manager change, recent form, home advantage,
+head-to-head, rest/travel, weather, model forecasts (Opta), Elo/SPI ratings.
 ### 3. Per-match probabilities (sport block)
@@ -98,7 +131,7 @@ results. Pick the block for the event's sport:
 - Derive the outcome split straight from de-vigged moneyline odds; estimate
   a plausible score from the market total. State the model used.
-Cross-check the model against the market; on a large divergence, re-check
+Cross-check the model against the consensus; on a large divergence, re-check
 the data and explain the cause before trusting it.
 ### 4. Convert to the EV-maximizing tip
@@ -106,32 +139,69 @@ the data and explain the cause before trusting it.
 Map probabilities to the tip with the **highest expected points under the
 step-1 rules** — not the prettiest match.
-- **Standard fixed-point scoring + goal "place well"** → tip the EV-maximal
-  result per match. Favourites with modest scorelines dominate. **No
-  contrarian** — only your tip matters for your score, so deliberately
-  tipping "different" just burns EV.
-- **Quote / rarity scoring** → weigh rarer-but-plausible results against
-  their higher payout; take rarity when `payout × probability` wins.
-- **Goal = win a large pool** → on a *subset* of matches, take calculated
-  variance (plausible underdogs) to create upside, poker-tournament style.
-**Participant-field thresholds** (when two tips are close, prefer the one
-with the higher edge over the typical participant):
-- Pool **N < 20** → maximize EV, ignore the field.
-- **20 ≤ N < 100 and you are in the prize positions** → maximize EV.
-- **N ≥ 100, or you are outside the top ~20%** → add field-relative
-  variance (move off the consensus on a subset; rough Kelly-fraction sizing).
+#### 4a. The EV-max scoreline is computed, never eyeballed
+Don't hand-pick a scoreline. Run the executed grid optimiser — builds the full
+Poisson score grid, returns the EV-max tip under the step-1 point tiers:
+```bash
+python3 scripts/prediction-pool/score_ev.py --lh <home-xg> --la <away-xg> \
+    --tendency <t> --diff <d> --exact <e>          # one match
+python3 scripts/prediction-pool/score_ev.py matches.json \
+    --tendency <t> --diff <d> --exact <e>          # batch, prints a ranked table
+```
+Two facts the grid makes unavoidable, intuition gets wrong:
+- **High scorelines almost never EV-max.** Under partial points a moderate
+  favourite peaks at **1:0 / 2:0 / 2:1**; **1:0 wins surprisingly often**, top
+  of the surface is *flat* (1:0 vs 2:1 vs 2:0 within hundredths). 3:2 / 4:1 /
+  1:4 never optimal — such a tip means the grid wasn't run.
+- **Draws under-tipped.** A correct draw banks the goal-difference tier on
+  every draw scoreline, so in a close match (xG within ~0.4) a 1:1 can
+  out-score a 1:0 — and for low-scoring even games (λ ≲ 1.0/side) a 0:0 is the
+  EV-max. Let the grid decide; the eye tips too few draws.
+- **Standard fixed-point scoring + goal "place well"** → tip the grid's EV-max
+  per match. **No contrarian** — only your tip scores, tipping "different"
+  burns EV.
+- **Quote / rarity scoring** → weigh rarer-but-plausible results against payout;
+  take rarity when `payout × probability` wins (raise `--exact` or post-process
+  the ranked table by the multiplier).
+#### 4b. Large pool, goal "win it" — measure P(finish 1st), don't guess
+Goal = **win** a large pool → target flips from E(points) to **P(finish ahead
+of the field)**; pure EV-max converges with the crowd, can't open a gap.
+Measure it with the executed field simulator, not a "rough Kelly" hand-wave:
+```bash
+python3 scripts/prediction-pool/pool_winsim.py pool.json --runs 4000 --max-flips 4
+```
+Models the field as softmax-EV tippers, reports `P(win)` for EV-max-everywhere,
+then greedily reports **which few tips to flip** off EV-max (EV cost + P(win)
+gain each). Read it as the field threshold, empirically:
+- Pool **N < 20** → sim shows flips barely move P(win); maximize EV, ignore the
+  field.
+- **20 ≤ N < 100 and in the prize positions** → maximize EV.
+- **N ≥ 100, or outside the top ~20%** → take the sim's suggested flips: a
+  handful of higher-variance scorelines on high-consensus matches lift P(win)
+  most per unit EV given up. Flip only what the sim says pays — variance you
+  don't need is wasted EV.
 Respect all strategy limits from step 1 (max identical tips, etc.).
-### 5. Tournament & bonus questions (no hallucination)
+### 5. Tournament, bonus & special questions — answer every one (no hallucination)
-For group winners, KO rounds, champion, and bonus questions, use **either**:
+Walk the **step-1 checklist** and answer **each** entry. Pick the method by
+question type — full taxonomy + per-type method in
+[`reference/odds-and-bonus.md`](reference/odds-and-bonus.md):
-- real **outright market odds** ("to win group", "to reach final",
-  "outright winner"), **or**
-- the executed Poisson tournament simulator:
+- **Tournament structure** (group winners, KO rounds, finalists, champion):
+  real **outright market odds** ("to win group", "to reach final", "outright
+  winner") aggregated per step 2, **or** the executed Poisson simulator:
   ```bash
   python3 scripts/prediction-pool/poisson_sim.py <teams-xg.json> --runs 20000
@@ -141,48 +211,89 @@ For group winners, KO rounds, champion, and bonus questions, use **either**:
   advancement / title probabilities. **Run it — never report simulated
   numbers you did not actually compute.**
-Optimize bonus answers on the same expected-points basis. Re-run as late as
-the deadline allows: re-check confirmed lineups, injuries, suspensions, and
-odds movement, then adjust. The pool's per-match deadline is the only hard
-constraint.
+- **Award / player markets** (top scorer, most assists, "which team supplies
+  the top scorer", golden boot, most cards): use the matching **special
+  market** — e.g. aggregate per-player "top goalscorer" odds **by team** to
+  answer "which team has the top scorer". No clean market → derive from a
+  stated model (squad strength × games-expected) and **label it a model
+  estimate**, not a market number.
+- **Binary / over-under specials** (red card yes/no, over/under total
+  goals/cards): de-vig the consensus probability for the line, pick the EV-max
+  side under the question's point weight.
+Optimize every answer on the same expected-points basis as the scores. Re-run
+as late as each question's deadline allows: re-check confirmed lineups,
+injuries, suspensions, odds movement, then adjust. The per-question deadline is
+the only hard constraint.
 ## Output format
 1. **Approval table** — one row per match:
    ```
-   Match | Tip | Prob / EV | Risk (low/med/high) | 1-line reason | Odds used
+   Match | Tip | Prob / EV | Risk (low/med/high) | 1-line reason | Books used
+   ```
+   `Books used` names the consensus base (e.g. "consensus of 7 books, sharp-weighted").
+2. **Bonus & special answers** — one row per open question from the step-1
+   checklist, **every entry answered** (none blank):
+   ```
+   Question | Answer | Prob / EV | Risk | 1-line reason | Source (market / model)
    ```
-2. **Group standings, the full bracket, and bonus-question answers** where
-   the event has them.
-3. **Self-check note** — confirm the tips reconcile with
-   [`reference/ev-fixtures.md`](reference/ev-fixtures.md) (known pool rules +
-   market odds → a known-good EV tip). If your method disagrees with a
-   fixture, your method is wrong — find the error (usually a forgotten
-   partial-points term or un-de-vigged odds), don't ship the tip.
+3. **Group standings and the full bracket** where the event has them.
+4. **Self-check note** — (a) tips reconcile with
+   [`reference/ev-fixtures.md`](reference/ev-fixtures.md) (known rules + odds →
+   known-good EV tip); (b) bonus table has the **same number of rows as the
+   step-1 checklist** — a shorter table means a question was dropped. If your
+   method disagrees with a fixture, your method is wrong — find the error
+   (usually a forgotten partial-points term, un-de-vigged odds, or following
+   one book instead of the consensus), don't ship the tip.
 Handed back to [`/prediction-pool`](../../commands/prediction-pool.md) for the approval
 gate — the skill never enters or submits anything.
 ## Gotcha
-- **Tipping the modal result, not the EV-maximal one.** The single most
-  likely scoreline rarely maximizes partial points — compute EV across the
+- **Answering only the scores.** Bonus / award questions carry real points;
+  leaving them blank because they are "not a scoreline" forfeits them. The
+  step-1 checklist exists so every question is answered.
+- **Following one portal.** A single book can be stale or shaded; build the
+  base from a sharp-weighted consensus across several; an outlier is a flag to
+  investigate, not a number to copy.
+- **Tipping the modal result, not the EV-maximal one.** The single most likely
+  scoreline rarely maximizes partial points — run `score_ev.py` across the
   result grid, don't eyeball the favourite.
-- **Forgetting to de-vig.** Raw bookmaker odds sum to >100%; treating them
-  as probabilities inflates the favourite. Remove the margin first.
-- **Contrarian under fixed points.** Deviating "to stand out" only helps
-  under quote/rarity rules or a win-a-large-pool goal — otherwise it burns EV.
-- **Claimed-but-unrun simulation.** Numbers like "I ran 10,000 tournaments"
-  without executing `poisson_sim.py` are hallucinated — run the code or use
-  outright odds.
+- **Hand-picking a high scoreline.** 3:2 / 4:1 / 1:4 never EV-max under partial
+  points — moderate favourites peak at 1:0 / 2:0 / 2:1. A high tip = grid
+  skipped; run `score_ev.py`.
+- **Under-tipping draws.** A correct draw banks the goal-difference tier on
+  every draw scoreline, so a close match can want 1:1 (or 0:0). Let the grid
+  decide; the eye tips too few draws.
+- **"Rough Kelly" variance for a large pool.** Don't guess deviation amount —
+  run `pool_winsim.py`; returns the exact flips that raise P(finish 1st) most
+  per unit EV given up.
+- **Forgetting to de-vig.** Raw bookmaker odds sum to >100%; treating them as
+  probabilities inflates the favourite. Remove the margin **per book** before
+  aggregating.
+- **Contrarian under fixed points.** Deviating "to stand out" only helps under
+  quote/rarity rules or a win-a-large-pool goal — otherwise it burns EV.
+- **Claimed-but-unrun simulation.** "I ran 10,000 tournaments" without
+  executing `poisson_sim.py` is hallucinated — run the code or use outright odds.
 ## Do NOT
+- Leave any open pool question (bonus / award / special) unanswered.
+- Build the base from a single bookmaker, or skip de-vigging before aggregating.
 - Tip the most likely result instead of the EV-maximal one.
+- Hand-pick a scoreline instead of running `score_ev.py` — never emit a
+  3:2 / 4:1 / 1:4 tip, never EV-max under partial points.
 - Go contrarian under standard fixed-point scoring with a "place well" goal.
-- Report Monte-Carlo numbers without running `poisson_sim.py`.
+- Guess large-pool variance ("rough Kelly") instead of running `pool_winsim.py`.
+- Report Monte-Carlo numbers without running `poisson_sim.py` / `pool_winsim.py`.
 - Treat raw odds as probabilities without removing the vig.
 - Give betting or financial advice — this optimizes a game; the human submits.
@@ -190,7 +301,14 @@ gate — the skill never enters or submits anything.
 - [`/prediction-pool`](../../commands/prediction-pool.md) — the orchestrator (event,
   persistence, Playwright entry, gates).
+- [`reference/odds-and-bonus.md`](reference/odds-and-bonus.md) — major-book list
+  + sharpness-weighted consensus recipe, and the bonus / award / special
+  question taxonomy with a per-type method.
 - [`reference/ev-fixtures.md`](reference/ev-fixtures.md) — known-good
   rules+odds → EV examples.
+- [`scripts/prediction-pool/score_ev.py`](../../../../scripts/prediction-pool/score_ev.py) —
+  executed exact-score EV optimiser (step 4a; λ + rule → EV-max scoreline).
+- [`scripts/prediction-pool/pool_winsim.py`](../../../../scripts/prediction-pool/pool_winsim.py) —
+  executed field model + P(finish 1st) simulator and flip-finder (step 4b).
 - [`scripts/prediction-pool/poisson_sim.py`](../../../../scripts/prediction-pool/poisson_sim.py) —
-  the executed tournament simulator.
+  the executed tournament simulator (step 5).

package/.agent-src/skills/prediction-pool-optimizer/evals/triggers.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "skill": "prediction-pool-optimizer",
-  "description": "7 should-trigger + 5 should-not-trigger queries. Should-trigger covers DE + EN phrasings and the core intent (pool tips, kicktipp, expected-points optimization across sports). Should-not-trigger covers near-miss neighbours: regulated financial advice (finance pack), plain match-result prediction with no pool, generic web research, AI video, and real-money sportsbook betting (out of scope / refuse).",
+  "description": "9 should-trigger + 5 should-not-trigger queries. Should-trigger covers DE + EN phrasings and the core intent (pool tips, kicktipp, expected-points optimization across sports, plus answering the bonus / award questions and using bookmaker-consensus odds). Should-not-trigger covers near-miss neighbours: regulated financial advice (finance pack), plain match-result prediction with no pool, generic web research, AI video, and real-money sportsbook betting (out of scope / refuse).",
   "queries": [
     {"q": "optimize my kicktipp tips for the football WM 2026", "trigger": true},
     {"q": "fill my company Tippspiel for the basketball world cup", "trigger": true},
@@ -9,6 +9,8 @@
     {"q": "maximiere meine erwarteten Punkte im Tippspiel, nicht nur wer gewinnt", "trigger": true},
     {"q": "predict our office kicktipp pool for the WM", "trigger": true},
     {"q": "mach mein Tippspiel für die WM", "trigger": true},
+    {"q": "beantworte auch alle Bonusfragen im kicktipp, z.B. welche Mannschaft den Torschützenkönig stellt", "trigger": true},
+    {"q": "use the odds from the big betting sites to optimize my pool picks", "trigger": true},
     {"q": "should we invest in this startup based on a DCF?", "trigger": false, "note": "regulated financial valuation → dcf-modeling / finance pack"},
     {"q": "who will win tonight's match?", "trigger": false, "note": "plain result prediction, no pool / no scoring rules to optimize"},
     {"q": "research the best running shoes for me", "trigger": false, "note": "generic web research → research / deep-research"},

package/.agent-src/skills/prediction-pool-optimizer/reference/ev-fixtures.md CHANGED Viewed

@@ -15,16 +15,25 @@ shape you encounter so future runs catch the same class of drift.
 **Rule:** exact result = 4, goal-difference = 3, tendency = 2, else 0.
 No quote rule. No strategy limit. Goal: place well.
-**Match (football):** de-vigged market — Home 62% / Draw 24% / Away 14%.
-Most plausible exact results (Poisson on market xG ≈ 1.7 : 0.8):
-2:1 ≈ 9%, 1:0 ≈ 9%, 2:0 ≈ 8%, 1:1 ≈ 8%, 3:1 ≈ 6%.
+**Match (football):** Poisson on market xG ≈ 1.7 : 0.8.
-**Reasoning:** the single most likely *result* (2:1) and 1:0 both bank the
-tendency (2) on a home win plus goal-difference (3) on many neighbours.
-Expected points of "2:1" beats "tip the favourite to win 3:0" (lower hit
-rate on diff/exact) and beats any draw/away tip (tendency rarely banks).
+**Script-verified** (`score_ev.py --lh 1.7 --la 0.8 --exact 4 --diff 3 --tendency 2`):
-**Known-good tip:** **2:1 home.** (Risk: low.) **Not** contrarian — under
+```
+EV-max tip : 1:0  (EV 1.574)
+  1:0  1.574  <- EV-max
+  2:1  1.530
+  2:0  1.477
+```
+**Reasoning:** top of the EV surface is **flat** — 1:0, 2:1, 2:0 all bank the
+tendency (2) plus goal-difference (3) on many neighbours, within hundredths of
+each other. Grid puts **1:0 narrowly first**; eyeballing the modal *result*
+(2:1) lands a near-tie, not the optimum. Run the grid — don't assert the
+favourite's "obvious" score.
+**Known-good tip:** **1:0 home** (2:1 essentially tied; with the real de-vigged
+λ either can lead — the grid decides). (Risk: low.) **Not** contrarian — under
 fixed points only your own tip scores, so deviating costs EV.
 ---
@@ -54,15 +63,17 @@ the multiplier; take the max.
 **Match (football):** a near-coin-flip favourite, Home 52% / Draw 26% /
 Away 22%.
-**Reasoning:** with N ≥ 100 and you behind, pure EV converges with the
-field and cannot create the gap you need. Add field-relative variance on a
-*subset*: take a plausible underdog/draw where the consensus is heavy on
-the favourite, sized by a rough Kelly fraction. On safe matches, still
-tip EV-max.
+**Reasoning:** N ≥ 100 and you behind → pure EV converges with the field, can't
+create the gap; target is **P(finish 1st)**, not E(points). Don't guess the
+variance: run `pool_winsim.py` with the pool's `N` and your `my_lead`. Shows
+P(win) collapsing under EV-max-everywhere, returns the **specific flips**
+(higher-variance scorelines on high-consensus matches) that raise P(win) most
+per unit EV given up.
-**Known-good tip:** EV-max on the safe matches; **calculated underdog**
-(e.g. 1:1 or away) on 2–4 high-consensus matches to manufacture upside.
-(Risk: high — intentional.)
+**Known-good tip:** EV-max on the safe matches; the **simulator's suggested
+flips** on the 2–4 matches it names, to manufacture upside. (Risk: high —
+intentional.) Verify the sim shows a P(win) gain — flips not moving it (small
+N) → don't add variance you don't need.
 ---
@@ -78,3 +89,87 @@ Margin modelled Gaussian, mean ≈ 6.5, sd ≈ 11.
 spread). Tip the winner plus the modal margin bucket.
 **Known-good tip:** **Home win, margin ~5–9.** (Risk: low on winner.)
+---
+## Fixture 5 — multi-book consensus (de-vig per book, sharp-weighted)
+**Rule:** any — checks the **odds base**, not the EV map.
+**Market (football, 1X2):** two books.
+- Book S (sharp, weight 3): 1.80 / 3.60 / 4.50 → de-vig 0.526 / 0.263 / 0.210.
+- Book R (recreational, weight 1): 1.75 / 3.50 / 4.20 → de-vig 0.522 / 0.261 / 0.217.
+**Reasoning:** de-vig **each book** first (raw `1/o` sums to >1; normalise),
+then sharp-weighted mean per outcome and renormalise. Aggregating raw odds, or
+using one book, is wrong.
+**Known-good base:** **Home 0.525 / Draw 0.262 / Away 0.212.** A run that fed
+the EV grid one book's raw odds has the wrong base — fix it before the tip.
+---
+## Fixture 6 — "team of the top scorer" (aggregate player market by team)
+**Rule:** bonus question = 6 points: "which team supplies the tournament top
+scorer?"
+**Market (top-goalscorer outright, de-vigged player probabilities):**
+- Team A: A1 14%, A2 5% → team A total **19%**.
+- Team B: B1 16% → team B total **16%**.
+- Team C: C1 9%, C2 4% → team C total **13%**.
+**Reasoning:** the most-likely *player* (B1, 16%) is on team B, but the
+question asks the **team** — sum each squad's players. Team A 19% beats team B
+16%. Answer the asked question, not the adjacent one.
+**Known-good answer:** **Team A.** (Source: market, aggregated by team. Risk:
+medium.) **Not** team B — the modal-player trap.
+---
+## Fixture 7 — high-scoreline trap (the "EV-optimized" model that wasn't)
+**Rule:** kicktipp 2 / 3 / 5 — tendency = 2, goal-difference = 3, exact = 5.
+**Matches (script-verified, `score_ev.py … --tendency 2 --diff 3 --exact 5`):**
+| Match (λ) | EV-max | a high tip's EV | verdict |
+|---|---|---|---|
+| Senegal–Iraq (2.0:0.7) | **1:0** (1.881) | 4:1 ≈ 1.55 | high tip leaks ~0.33 |
+| Qatar–Switzerland (0.6:2.1) | **0:1** (1.981) | 1:4 ≈ 1.65 | tipping the underdog's goals = costliest move on the board |
+| Spain–CapeVerde (2.3:0.6) | **2:0** (2.033) | 3:1 ≈ 1.88 | only at λ ≳ 2.3 does 2:0 edge past 1:0; never higher |
+**Reasoning:** under partial points the value sits in the tendency and
+goal-difference tiers, not the exact high score. **1:0 is the optimum
+astonishingly often** (even for clear favourites at λ ≈ 2.0); 2:0 takes over
+only near λ ≈ 2.3–2.4; above that, never. **3:2 / 4:1 / 4:2 / 1:4 are never
+EV-max.** Adding goals — especially the underdog's — only shrinks the hit
+probability without protecting the diff/tendency points.
+**Known-good behaviour:** any 3:2 / 4:x / x:4 tip in the run → the grid wasn't
+run; `score_ev.py` is the gate. (Risk: low; correctness fixture, not strategy.)
+---
+## Fixture 8 — draws are under-tipped
+**Rule:** kicktipp 2 / 3 / 5 (as Fixture 7).
+**Matches (script-verified, `score_ev.py … --tendency 2 --diff 3 --exact 5`):**
+```
+λ 1.0:1.0  ->  EV-max 0:0 (1.196), 1:1 tied (1.196)   # a draw IS the optimum
+λ 0.9:0.9  ->  EV-max 0:0 (1.317), 1:1 second
+λ 1.2:1.2  ->  EV-max 1:0 (1.150), draw third (1.091)  # 1-goal win edges it
+```
+**Reasoning:** people tip too few draws. A correct draw banks the
+goal-difference tier (3) on *every* draw scoreline, so in a **low-scoring even
+match (λ ≲ 1.0/side) the draw — usually 0:0 — is the EV-max**, tied with 1:1.
+As λ rises past ~1.1 a one-goal win edges ahead, but the draw stays in the top
+tips. Grid surfaces this; intuition suppresses it.
+**Known-good behaviour:** a tip set with **near-zero draws across many
+low-scoring even matches** is a red flag — re-run `score_ev.py`, let the grid
+decide, don't default every close game to 1:0.

package/.agent-src/skills/prediction-pool-optimizer/reference/odds-and-bonus.md ADDED Viewed

@@ -0,0 +1,109 @@
+# Odds aggregation + bonus-question taxonomy
+Lookup material for `prediction-pool-optimizer`. Two parts:
+- **A — Multi-book consensus**: which books to read, how to weight them, how
+  to fold them into one calibration probability.
+- **B — Bonus / award / special questions**: a type → method table so every
+  open question in the pool reaches an answer.
+Not betting advice; how to read a public market as a probability prior for a
+fun pool.
+---
+## A. Multi-book consensus — read several, weight by sharpness
+### Why not one book
+A single bookmaker's line can be stale, regionally shaded, or carry a fat
+margin. A consensus across several books is a far better probability estimate,
+and cross-book agreement tells you the market's confidence. **Never mirror one
+portal.**
+### Which books (5–10, publicly viewable)
+Availability varies by region and over time — this list is **illustrative,
+refresh it at run time** and use whatever is publicly viewable from the current
+locale. Fastest way to see many at once = an **odds-comparison aggregator**:
+- **Aggregators (many books on one page):** Oddschecker, Oddsportal,
+  Betexplorer, OddsAlert.
+- **Sharp / low-margin reference books (weight higher):** Pinnacle, Betfair
+  Exchange (an exchange = closest thing to a true market price).
+- **Large recreational books (weight lower):** bet365, Bwin, William Hill,
+  Unibet, Betano, Tipico, Interwetten, bet-at-home, 888sport, Winamax.
+Aim for **5–10** spanning both groups. If only recreational books are viewable,
+say so in the run note — the consensus is then softer.
+### The recipe
+1. **Per market, per book**: collect decimal odds (1X2, exact-score, each
+   outright, each special/award market a bonus question needs).
+2. **De-vig each book independently.** For 1X2 decimal odds `o_H, o_D, o_A`,
+   raw implied probs `1/o` sum to `>1` (overround); normalise:
+   `p_i = (1/o_i) / Σ(1/o)`. Per book — never aggregate raw odds.
+3. **Sharpness-weight and combine.** Sharp books > recreational; **weighted
+   mean** — or **trimmed median** when books disagree a lot (robust to one
+   outlier). A defensible weighting:
+   - Pinnacle / Betfair Exchange → weight 3
+   - large recreational books → weight 1
+   - aggregator "average" column → weight 1 (already blends many)
+   `p_consensus = Σ(wᵢ · pᵢ) / Σwᵢ` per outcome, then re-normalise the outcome
+   set to sum to 1.
+4. **Outlier handling.** One book far off the others = **flag, not truth**:
+   check for a reason (priced-in injury, stale line) before moving the
+   consensus. Cross-book agreement = signal; one disagreeing book = investigate.
+5. **Healthy weighting overall.** The consensus is a **prior**. Blend it with
+   the per-sport model (Poisson / Gaussian) and override only with *current*
+   info the market has not absorbed (confirmed lineup, late injury,
+   suspension, manager change). The pool answer is EV under the rules on top of
+   this blended probability — the market informs, it does not dictate.
+### Worked mini-example (1X2)
+Two books, home/draw/away decimal odds:
+- Book S (sharp, w=3): 1.80 / 3.60 / 4.50 → raw 0.556/0.278/0.222 (sum 1.056)
+  → de-vig 0.526/0.263/0.210
+- Book R (recreational, w=1): 1.75 / 3.50 / 4.20 → raw 0.571/0.286/0.238
+  (sum 1.095) → de-vig 0.522/0.261/0.217
+Weighted mean (3:1), per outcome, then renormalise:
+≈ **Home 0.525 / Draw 0.262 / Away 0.212**. That is the calibration base for
+the per-match EV grid — not either book's raw number.
+---
+## B. Bonus / award / special questions — type → method
+Every entry on the step-1 checklist gets an answer. Match the question to a
+row; real market where one exists, a **labelled** model estimate where none
+does. Optimize each on expected points under its point weight.
+| Question type | Example | Method |
+|---|---|---|
+| **Outright winner** | "Who wins the tournament?" | Outright "to win" market, consensus per A; or `poisson_sim.py` `title_pct`. EV-max under the question's points. |
+| **Group / stage** | "Who wins group X?", "Who advances?" | "To win group" / "to qualify" markets; or `advance_pct` from the simulator. |
+| **Finalists / matchup** | "Who reaches the final?" | "To reach final" market per team; simulator pairing is approximate — prefer the market. |
+| **Top scorer (player)** | "Tournament top scorer?" | "Top goalscorer" outright market, consensus per A; EV-max player (favourite unless rarity scoring rewards a longer shot). |
+| **Team of the top scorer** | "Which team supplies the top scorer?" | Aggregate per-player top-scorer probabilities **by team** (sum each squad); pick the highest-summed team. |
+| **Most assists / cards / etc.** | "Most yellow cards?" | Matching special market if offered; else a labelled model estimate (discipline/aggression proxy). |
+| **Binary special** | "Will there be a red card in match X?" | De-vig the yes/no line to a probability; EV-max side under the points. No market → labelled base-rate estimate. |
+| **Over / under total** | "Over/under total goals / cards?" | De-vig the totals line at the offered threshold; higher-EV side. |
+| **Exact stat** | "How many goals in the final?" | Market totals distribution if available; else per-match Poisson on consensus xG. State the model. |
+### Rules for bonus answers
+- **Answer all of them.** The output's bonus table must have the same number
+  of rows as the step-1 checklist. A missing row = a dropped question.
+- **Market first, labelled model second.** Prefer a real special market; none
+  exists → derive from a stated model and mark `Source: model`.
+- **Rarity rules apply here too.** Under quote/rarity scoring, a
+  plausible-but-rarer answer can out-score the favourite when
+  `payout × probability` is higher — same EV logic as the scores.
+- **No hallucinated numbers.** Outright/award probabilities come from real
+  markets or the executed simulator — never a claimed-but-unrun simulation.