npm - @event4u/agent-config - Versions diffs - 1.15.0 → 1.16.0 - Mend

@event4u/agent-config 1.15.0 → 1.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (244) hide show

package/llms.txt CHANGED Viewed

@@ -16,7 +16,7 @@ api-testing: Use when writing API endpoint tests — integration tests, contract
 artisan-commands: Use when creating or modifying Artisan commands. Covers clear signatures, safe execution flow, helpful output, and project conventions for console tooling.
 authz-review: Use when reviewing authorization end-to-end — route → gate → policy → query scope → response filter — before changes to permissions, tenants, ownership, or admin flows.
 aws-infrastructure: Use when working with AWS resources — ECS Fargate, ECR, EFS, Secrets Manager, gomplate templates, multi-env deployments — even when the user says 'deploy to staging' without naming AWS.
-blade-ui: Use when creating or editing Blade views, components, partials, layouts, or view logic — even when the user says 'add a new page' or 'render this data' without naming Blade.
+blade-ui: Stack-implementation skill for Laravel Blade — dispatched by `directives/ui/apply.py` (and `review.py` / `polish.py`) when the project's frontend stack is Blade. Covers views, components, partials, layouts, and view logic.
 blast-radius-analyzer: Use BEFORE editing shared code — enumerates every call site, event consumer, queue worker, API client, migration, and test that a planned change will touch, with a file:line citation per dependency.
 bug-analyzer: Use when the user shares a Sentry error, Jira bug ticket, or error description and wants root cause analysis. Also for proactive bug hunting and code audits for hidden bugs.
 check-refs: Use when verifying cross-references between skills, rules, commands, guidelines, and context documents are not broken after edits, renames, or deletions.
@@ -41,15 +41,17 @@ developer-like-execution: Use when implementing, debugging, refactoring, or revi
 docker: Use when working with Docker — Dockerfile edits, docker-compose services, containers, or the dual-container (fast + Xdebug) setup — even when the user just says 'my container won't start'.
 dto-creator: Use when the user says "create a DTO", "new data transfer object", or needs to convert request/response data into a typed PHP class. Creates DTOs with SimpleDto base class and attribute mapping.
 eloquent: Use when writing Eloquent models, relationships, scopes, or queries via Model:: — 'fetch users with their orders'. NOT for PHPStan output, non-Eloquent services, or raw SQL questions.
-fe-design: Use when designing frontend interfaces — component architecture, layout patterns, form design, table patterns, responsive strategies, and UX principles for Blade/Livewire/Flux/Tailwind.
+"estimate-ticket": Estimate a Jira/Linear ticket — 'estimate PROJ-123', 'wie groß ist das?', 'should we split this?' — size + risk + split + uncertainty, sibling of /refine-ticket, close-prompt.
+existing-ui-audit: Use BEFORE writing or editing any non-trivial UI — inventories components, design tokens, shadcn primitives, and reusable patterns into state.ui_audit. Hard gate for the ui directive set.
+fe-design: Reference for frontend-design heuristics — component architecture, layout patterns, form/table design, responsive strategy, a11y, UX principles. Stack-agnostic; cited by directives/ui/design.py.
 feature-planning: Use when the user says "plan a feature", "brainstorm", "explore this idea", or wants to go from idea to structured plan and roadmap.
 file-editor: Use when opening edited files in the user's IDE. Reads settings from .agent-settings.yml to determine IDE and whether auto-open is enabled.
 finishing-a-development-branch: Use when the feature is implementation-complete and the next step is 'ship it' — verifies, cleans up, and routes to merge/PR/park/discard — even when the user just says 'I'm done, what now?'.
-flux: Use when writing Laravel Flux UI components — the official Livewire component library by the Laravel team. Covers components, slots, and variants.
+flux: Stack-implementation skill for Laravel Flux — dispatched by `directives/ui/apply.py` (and `review.py` / `polish.py`) when the project uses `livewire/flux`. Covers Flux components, slots, variants, and form primitives.
 git-workflow: Use when working with Git — branch naming, commit messages, PR creation, rebasing, or the code review process — even when the user says 'push this' or 'merge the branch' without naming Git.
 github-ci: Use when working with GitHub Actions — workflow YAML, quality gates, test matrices, deployment triggers, reusable workflows — even when the user just says 'my CI is failing' or 'add a check'.
 grafana: Use when working with Grafana — dashboards, Loki LogQL queries, alerting rules, monitoring panels — even when the user just says 'build me a dashboard' or 'query the logs' without naming Grafana.
-guideline-writing: Use when creating or editing a guideline in .agent-src.uncompressed/guidelines/ — reference material cited by skills, no auto-triggers — even when the user just says 'write up our naming conventions'.
+guideline-writing: Use when creating or editing a guideline in docs/guidelines/ — reference material cited by skills, no auto-triggers — even when the user just says 'write up our naming conventions'.
 jira-integration: Use when the user says "check Jira", "create ticket", "update issue", or needs JQL queries, ticket transitions, or branch-to-ticket linking.
 jobs-events: Use when creating Laravel jobs, queued workflows, events, or listeners. Covers clear responsibilities, safe serialization, and retry/failure handling.
 judge-bug-hunter: Use when a diff needs correctness review — null-safety, edge cases, off-by-one, races, error handling — dispatched by /review-changes, /do-and-judge, /judge, even without 'judge'.
@@ -68,9 +70,10 @@ laravel-scheduling: Use when configuring Laravel task scheduling — cron expres
 laravel-validation: Use when writing validation — Form Requests, rules, custom rule objects, request-boundary design — even when the user just says 'validate this input' or 'check the request' without naming it.
 learning-to-rule-or-skill: Use when a repeated learning, mistake, or successful pattern should be turned into a new rule or skill. Also use after completing a task to capture learnings from the work.
 lint-skills: Use when running the package's skill linter against all skills and rules to validate frontmatter, required sections, and execution metadata.
-livewire: Use when writing Livewire components — reactive state, events, lifecycle hooks, and clean separation between component logic and Blade templates.
+livewire: Stack-implementation skill for Livewire — dispatched by `directives/ui/apply.py` (and `review.py` / `polish.py`) when the project's frontend stack is Livewire. Covers reactive state, events, lifecycle hooks, and component/view separation.
 logging-monitoring: Use when working with logging or monitoring — Sentry error tracking, Grafana/Loki log aggregation, structured logging channels, or monitoring helpers.
 mcp: Use when working with MCP (Model Context Protocol) servers — their tools, capabilities, and best practices for effective agent workflows.
+md-language-check: Use BEFORE saving any .md under .augment/, .agent-src*/, or agents/ — scans umlauts, German function words, and quoted German phrases outside DE:/EN: anchor blocks. Hard gate per language-and-tone.
 merge-conflicts: Use when the user has merge conflicts or says "resolve conflicts". Understands conflict markers, resolution strategies, and verification workflow.
 migration-creator: Use when the user says "create migration", "add column", or "new table". Creates migrations with correct table prefixes, column naming, and multi-tenant awareness.
 module-management: Use when the user says "create module", "explore module", or works within app/Modules/. Understands module structure, auto-loading, route registration, and namespace conventions.
@@ -95,10 +98,13 @@ project-analysis-zend-laminas: Use for deep Zend Framework or Laminas project an
 project-analyzer: ONLY when user explicitly requests: full project analysis, tech stack detection, or structured analysis documents for agents/analysis/. NOT for regular feature work.
 project-docs: Use when looking for project-specific documentation. Knows which docs exist in agents/docs/ and agents/contexts/ and maps work areas to relevant docs.
 quality-tools: Use when PHPStan, Rector, or ECS output appears — \"phpstan says mixed\", type errors, \"fix code style\", \"run rector\" — even when Eloquent/Laravel/model code is also mentioned.
+react-shadcn-ui: Use when building React UI on shadcn/ui primitives + Tailwind — the apply/review/polish skill dispatched by `directives/ui/*` for the `react-shadcn` stack.
 readme-reviewer: Use when reviewing a README for accuracy, usability, and alignment with the actual repository. Detects invented content, broken setup steps, and structural issues.
 readme-writing: Use when creating, rewriting, or significantly improving a README based on the actual repository structure, commands, and intended audience.
 readme-writing-package: Use when creating or rewriting a README for a reusable package or library. Focus on installability, minimal usage example, compatibility, and developer onboarding.
 receiving-code-review: Use when processing code review feedback (bot or human) before changing anything — triages, verifies, and pushes back with technical reasoning — even when the user just says 'fix the comments'.
+"refine-prompt": Reconstruct a free-form prompt into actionable AC + assumptions + confidence band before the engine plans — '/work \"…\"', 'baue X', 'ist der Prompt klar genug für die Engine?'.
+"refine-ticket": Refine a Jira/Linear ticket before planning — 'refine ticket', 'tighten AC on PROJ-123', 'ist das Ticket klar?' — rewritten ticket, Top-5 risks, persona voices, sub-skills orchestrated, close-prompt.
 requesting-code-review: Use when asking for a review or creating a PR — self-review first, frame the right context, test plan included — even when the user just says 'open a PR' or 'ready to merge'.
 review-routing: Use when preparing a PR description, suggesting reviewers, or flagging risk — produces owner-mapped roles plus historical bug-pattern matches from project-local YAML.
 roadmap-management: Use when the user says "create roadmap", "show roadmap", or "execute roadmap". Creates, reads, and manages roadmap files with phase tracking.
@@ -113,7 +119,7 @@ skill-management: Use when compressing, decompressing, refactoring, or improving
 skill-reviewer: Use when reviewing, auditing, or optimizing skills — validates against the 7 Skill Killers checklist and produces fix recommendations.
 skill-writing: Use when deciding 'should this be a skill or a rule?', creating/improving/reviewing agent skills, SKILL.md frontmatter, or procedure sections — even without saying 'skill-writing'.
 sql-writing: Use when writing raw SQL — MariaDB/MySQL syntax, parameterization, raw migrations, seeders with `DB::statement` — even when the user just pastes a query and asks 'why is this slow' without naming SQL.
-subagent-orchestration: Use when orchestrating implementer/judge subagents — five modes (do-and-judge, do-in-steps, do-in-parallel, do-competitively, judge-with-debate) — model pairing and parallelism from .agent-settings.yml.
+subagent-orchestration: Use when orchestrating implementer/judge subagents — five modes (do-and-judge, do-in-steps, do-in-parallel, do-competitively, judge-with-debate) — models from .agent-settings.yml.
 systematic-debugging: Use when hitting a bug, test failure, crash, or unexpected behavior — enforces reproduce → isolate → hypothesize → verify before any fix — even when the user just says 'this is broken' or 'quick fix'.
 technical-specification: Use when the user says "write a spec", "create RFC", or "document this decision". Writes technical specifications, RFCs, and ADRs with clear structure.
 terraform: Use when writing Terraform — AWS modules, resources, variables, outputs, remote state — even when the user just says 'provision this infra' or 'add an S3 bucket' without naming Terraform.
@@ -126,5 +132,5 @@ universal-project-analysis: ONLY when user explicitly requests: full project ana
 upstream-contribute: Use when a learning, new skill, rule improvement, or bug fix from a consumer project should be contributed back to the shared agent-config package.
 using-git-worktrees: Use when starting parallel work in isolation from the current branch — spawn a git worktree with ignore-safety checks and a clean test baseline — even when the user says 'try this on the side'.
 "validate-feature-fit": Validate whether a feature request fits the existing codebase — check for duplicates, contradictions, scope creep, and architectural misfit
-verify-before-complete: Use when claiming 'done', suggesting a commit, push, or PR — runs the evidence gate so completion claims come from fresh output in this message, not memory or earlier runs.
+verify-completion-evidence: Use when claiming 'done', suggesting a commit, push, or PR — runs the evidence gate so completion claims come from fresh output in this message, not memory or earlier runs.
 websocket: Use when building real-time features — WebSocket broadcasting, live updates, presence channels, connection state — even when the user just says 'push this to the client live'.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
     "name": "@event4u/agent-config",
-    "version": "1.15.0",
+    "version": "1.16.0",
     "description": "Shared agent configuration \u2014 skills, rules, commands, guidelines, and templates for AI coding tools",
     "license": "MIT",
     "private": false,

package/scripts/agent-config CHANGED Viewed

@@ -48,6 +48,10 @@ Commands:
   roadmap:progress-check     Fail if agents/roadmaps-progress.md is stale (for CI)
   hooks:install              Install the pre-commit roadmap-progress hook
                              (use --print to dump it, --force to overwrite an existing hook)
+  keys:install-anthropic     Install the Anthropic API key for the AI Council
+                             (interactive, /dev/tty only, writes ~/.config/agent-config/anthropic.key 0600)
+  keys:install-openai        Install the OpenAI API key for the AI Council
+                             (interactive, /dev/tty only, writes ~/.config/agent-config/openai.key 0600)
   first-run                  Guided first-run setup — cost profile, settings, tooling
   implement-ticket           Drive the work_engine Python engine on a ticket envelope
                              (Option-A loop; called by the /implement-ticket command)
@@ -78,6 +82,8 @@ Examples:
   ./agent-config mcp:check
   ./agent-config roadmap:progress
   ./agent-config hooks:install
+  ./agent-config keys:install-anthropic
+  ./agent-config keys:install-openai
   ./agent-config first-run
   ./agent-config implement-ticket --state-file .work-state.json
   ./agent-config work --state-file .work-state.json --prompt-file prompt.txt
@@ -390,6 +396,21 @@ HELP
   echo "    To uninstall: rm $target"
 }
+# Wrap the interactive key installers under a stable CLI entry. The shell
+# scripts themselves enforce /dev/tty, 0600, and atomic write — this is
+# pure routing so consumers never have to know the package layout.
+cmd_keys_install_anthropic() {
+  local script
+  script="$(resolve_script "scripts/install_anthropic_key.sh")" || return 1
+  exec bash "$script" "$@"
+}
+cmd_keys_install_openai() {
+  local script
+  script="$(resolve_script "scripts/install_openai_key.sh")" || return 1
+  exec bash "$script" "$@"
+}
 main() {
   local cmd="${1-}"
   [[ $# -gt 0 ]] && shift || true
@@ -400,6 +421,8 @@ main() {
     roadmap:progress)        cmd_roadmap_progress "$@" ;;
     roadmap:progress-check)  cmd_roadmap_progress_check "$@" ;;
     hooks:install)           cmd_hooks_install "$@" ;;
+    keys:install-anthropic)  cmd_keys_install_anthropic "$@" ;;
+    keys:install-openai)     cmd_keys_install_openai "$@" ;;
     first-run)               cmd_first_run "$@" ;;
     implement-ticket)        cmd_implement_ticket "$@" ;;
     work)                    cmd_work "$@" ;;

package/scripts/ai_council/__init__.py ADDED Viewed

@@ -0,0 +1,39 @@
+"""ai_council — external-AI consultation module.
+The host agent uses this to poll independent models (OpenAI, Anthropic)
+for second opinions on roadmaps, diffs, free-form prompts, or file sets.
+Council members never see the host agent's reasoning — only the artefact
+plus a neutral system prompt asking for an independent critique.
+Architecture:
+    clients.py      — ExternalAIClient ABC + concrete OpenAI/Anthropic
+                      impls + 0600 key loaders (no env-var fallback).
+    bundler.py      — Context bundling with redaction + size guard.
+    orchestrator.py — Parallel fan-out, error normalisation, cost cap.
+    prompts.py      — Neutrality system-prompt templates per input mode.
+Trust boundary: this module makes networked, paid calls. Tokens come
+exclusively from ~/.config/agent-config/<provider>.key (mode 0600). The
+module never edits files, never opens PRs, never merges — output is
+text only, advisory.
+"""
+from scripts.ai_council.clients import (
+    AnthropicClient,
+    CouncilResponse,
+    ExternalAIClient,
+    KeyGateError,
+    OpenAIClient,
+    load_anthropic_key,
+    load_openai_key,
+)
+__all__ = [
+    "AnthropicClient",
+    "CouncilResponse",
+    "ExternalAIClient",
+    "KeyGateError",
+    "OpenAIClient",
+    "load_anthropic_key",
+    "load_openai_key",
+]

package/scripts/ai_council/_default_prices.py ADDED Viewed

@@ -0,0 +1,41 @@
+"""Shipped baseline prices for the AI Council.
+This file is the bootstrap source for `.agent-prices.md` when the
+runtime file is missing. It is also the network-fallback source for
+`scripts/update_prices.py` when the upstream feed (LiteLLM) is
+unreachable.
+Prices are USD per **1 000 000** tokens. Models are identified by the
+exact `model:` string the user puts into `.agent-settings.yml`.
+Numbers below are a hand-curated snapshot — they will drift. The
+runtime never reads them directly once `.agent-prices.md` exists; the
+weekly refresh and user edits are the live source of truth.
+"""
+from __future__ import annotations
+# YYYY-MM-DD of when this table was last hand-edited. Keep in sync with
+# the test_default_prices freshness assertion if you bump this.
+LAST_UPDATED = "2026-04-29"
+# (provider, model)  ->  (input_per_1m_usd, output_per_1m_usd)
+DEFAULT_PRICES: dict[tuple[str, str], tuple[float, float]] = {
+    # ── Anthropic ────────────────────────────────────────────────────
+    ("anthropic", "claude-sonnet-4-5"): (3.00, 15.00),
+    ("anthropic", "claude-opus-4-1"): (15.00, 75.00),
+    ("anthropic", "claude-haiku-4-5"): (1.00, 5.00),
+    # ── OpenAI ───────────────────────────────────────────────────────
+    ("openai", "gpt-4o"): (2.50, 10.00),
+    ("openai", "gpt-4o-mini"): (0.15, 0.60),
+    ("openai", "o1"): (15.00, 60.00),
+    ("openai", "o3-mini"): (1.10, 4.40),
+}
+def as_rows() -> list[tuple[str, str, float, float]]:
+    """Return the table sorted (provider, model) for stable Markdown output."""
+    return [
+        (provider, model, prices[0], prices[1])
+        for (provider, model), prices in sorted(DEFAULT_PRICES.items())
+    ]

package/scripts/ai_council/_one_off_rebalancing_audit.py ADDED Viewed

@@ -0,0 +1,149 @@
+"""One-off council consultation — Phase 0 audit findings on rebalancing roadmap.
+Validates whether the rebalancing roadmap premise still holds against the
+actual PR #34 diff. Transient script; can be deleted after the consult runs.
+Invocation:
+    .venv/bin/python -m scripts.ai_council._one_off_rebalancing_audit
+"""
+from __future__ import annotations
+import sys
+from pathlib import Path
+from scripts.ai_council.clients import AnthropicClient, load_anthropic_key
+from scripts.ai_council.orchestrator import CostBudget, CouncilQuestion, consult, estimate
+from scripts.ai_council.pricing import estimate_cost, load_prices
+from scripts.ai_council.project_context import detect_project_context
+REPO_ROOT = Path(__file__).resolve().parents[2]
+ORIGINAL_ASK = (
+    "Phase 0 audit of the rebalancing roadmap. The roadmap was written based on "
+    "five rounds of external review claiming PR #34 over-deleted implicit "
+    "expertise. Validate whether the premise still holds against the actual diff. "
+    "Three concrete questions: (1) is the deletion narrative supported by the "
+    "numbers? (2) which of the 6 phases are already done or moot? "
+    "(3) what is the actual minimum-viable scope still worth executing?"
+)
+ARTEFACT = """# Phase 0 audit findings - road-to-rebalancing.md
+## Premise from the roadmap
+The risk surface is whether implicit expertise (edge cases, decision forks,
+failure modes, anti-patterns) was trimmed alongside the redundancy.
+Rebalancing means restoring intelligence without re-inflating Always-rules.
+## Actual numbers from the PR #34 diff
+Scope: git diff origin/main...HEAD, path .agent-src.uncompressed/rules/
+- 35 files changed: 202 insertions, 204 deletions => net -2 lines total.
+- Largest delta: language-and-tone.md 37 ins / 96 del. The 96 lines were
+  EXTRACTED to docs/guidelines/language-and-tone-examples.md (79 lines),
+  not deleted. Net knowledge loss: ~17 lines of duplicated phrasing.
+- Second-largest: roadmap-progress-sync.md 26 ins / 33 del - minor.
+- All other rules: <=8 line changes each, mostly renames.
+- ZERO rule files deleted outright.
+## Phase-by-phase realities
+### Phase 0 - Removed-Knowledge Audit
+This document IS Phase 0. Findings: 80% redundancy, examples extracted to
+safe layer, zero decision-logic deletions. Phase 0 is now COMPLETE.
+### Phase 1 - Pilot Context Split (3 rules)
+Candidates: autonomous-execution, minimal-safe-diff, scope-control.
+- autonomous-execution: 8 line delta in PR, no extraction yet.
+- minimal-safe-diff: already auto-trigger; 6 line delta.
+- scope-control: 40 line ADDITION (not deletion) in this PR. Pilot value low.
+### Phase 2 - load_context: convention + linter
+0 rules use load_context: today. Convention does not exist.
+Genuinely net-new work.
+### Phase 3 - Guidelines domain folders
+Already done. 47 guidelines, 46 already in domain folders
+(agent-infra/, docs/, e2e/, php/). Only language-and-tone-examples.md
+is flat at root. Phase 3 reduces to deciding where the one flat file goes.
+### Phase 4 - Golden-Transcript-backed demos under examples/flows/
+Partially shipped. docs/end-to-end-walkthroughs.md (built last cycle)
+already does this with 4 traces anchored to GT-1, GT-P1, GT-U2, GT-2.
+Required cases per roadmap: implement-ticket-demo, work-freeform-demo,
+ui-track-demo, blocked-path-demo (all covered) plus mixed-flow-demo
+(NOT covered yet).
+Net-new: 1 mixed-flow demo + folder move from docs/ to examples/flows/.
+### Phase 5 - Rule priority hierarchy + interaction matrix
+Partially shipped. docs/contracts/rule-interactions.yml and
+rule-interactions.md exist (13 pairs across 9 rules).
+rule-priority-hierarchy.md does NOT exist.
+### Phase 6 - Senior-agent behavior tests
+Not started. Net-new work, but only valuable if Phase 1 actually runs
+and produces something to validate.
+## Question to council
+Given:
+- The deletion narrative is empirically thin (-2 net lines after extraction).
+- 3 of 6 phases are already done or near-done.
+- Phase 1 pilot value is questionable on the 3 named candidates.
+Should this roadmap:
+(A) Close out as substantially-already-done. Execute only the small delta
+    (1 mixed-flow demo, move walkthroughs to examples/flows/, add the
+    priority-hierarchy doc, add load_context: convention if cheap),
+    then archive.
+(B) Drop Phase 1 pilot - the three named candidates don't show evidence of
+    over-deletion. Execute Phase 2 + Phase 4 (1 missing demo + restructure)
+    + Phase 5 (priority-hierarchy doc) only.
+(C) Original full scope - assume the audit missed something subtle and run
+    all 6 phases as written.
+(D) Other framing - propose a tighter scope based on what the audit shows.
+Identify blind spots: did the diff miss content moves between branches?
+Are there rules whose implicit expertise lives in non-line-count signal?
+Recommend (A/B/C/D) with rationale.
+"""
+def main() -> int:
+    api_key = load_anthropic_key()
+    client = AnthropicClient(api_key=api_key)
+    project = detect_project_context(REPO_ROOT)
+    table = load_prices()
+    question = CouncilQuestion(mode="roadmap", user_prompt=ARTEFACT, max_tokens=2048)
+    estimates = estimate(question, [client], table, project=project, original_ask=ORIGINAL_ASK)
+    print(f"[estimate] ~{estimates[0].input_tokens} in + {estimates[0].output_tokens} out = ${estimates[0].total_usd:.4f}")
+    budget = CostBudget(
+        max_input_tokens=50_000, max_output_tokens=20_000,
+        max_calls=5, max_total_usd=0.50,
+    )
+    print(f"[consult] calling {client.name}/{client.model} ...")
+    responses = consult([client], question, budget, table=table, project=project, original_ask=ORIGINAL_ASK)
+    if not responses or responses[0].error:
+        err = responses[0].error if responses else "no response"
+        print(f"[error] {err}", file=sys.stderr)
+        return 1
+    r = responses[0]
+    actual = estimate_cost(r.provider, r.model, r.input_tokens, r.output_tokens, table)
+    print(f"[done] {r.input_tokens} in / {r.output_tokens} out, {r.latency_ms} ms, actual ${actual.total_usd:.4f}")
+    print("=" * 72)
+    print(r.text)
+    print("=" * 72)
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

package/scripts/ai_council/_one_off_roundtrip.py ADDED Viewed

@@ -0,0 +1,106 @@
+"""One-off Phase-1 round-trip runner.
+Used exactly once to generate the evidence artefact required to lift
+the capture-only fence on `road-to-ai-council.md` Phase 2+ and the
+end-to-end verification on `road-to-council-modes.md` Phase 2a.
+Not part of the public CLI surface — `/council` remains the supported
+entry point. This script is committed under `scripts/ai_council/` so
+the evidence is reproducible from the git history alone.
+Invocation:
+    .venv/bin/python -m scripts.ai_council._one_off_roundtrip
+"""
+from __future__ import annotations
+import sys
+from pathlib import Path
+from scripts.ai_council.bundler import bundle_roadmap
+from scripts.ai_council.clients import AnthropicClient, load_anthropic_key
+from scripts.ai_council.orchestrator import (
+    CostBudget,
+    CouncilQuestion,
+    consult,
+    estimate,
+)
+from scripts.ai_council.pricing import estimate_cost, load_prices
+from scripts.ai_council.project_context import detect_project_context
+from scripts.ai_council.session import SessionManifest, save as save_session
+REPO_ROOT = Path(__file__).resolve().parents[2]
+ROADMAP_PATH = REPO_ROOT / "agents/roadmaps/road-to-council-modes.md"
+ORIGINAL_ASK = (
+    "Bitte review die folgende Roadmap (council-modes Phase 2c "
+    "Playwright). Die Maintainer-Recommendations für Q1-Q5 sind im "
+    "Block 'Decisions Required' bereits hinterlegt. Frage: sollten "
+    "wir die Recommendations annehmen wie sie sind, oder gibt es "
+    "blinde Flecken die wir vor dem Lift der capture-only fence "
+    "kläeren sollten?"
+)
+def main() -> int:
+    api_key = load_anthropic_key()
+    client = AnthropicClient(api_key=api_key)
+    context = bundle_roadmap(ROADMAP_PATH)
+    project = detect_project_context(REPO_ROOT)
+    table = load_prices()
+    question = CouncilQuestion(
+        mode="roadmap",
+        user_prompt=context.text,
+        max_tokens=2048,
+    )
+    estimates = estimate(
+        question, [client], table,
+        project=project, original_ask=ORIGINAL_ASK,
+    )
+    print(f"[estimate] {client.name}/{client.model}: "
+          f"~{estimates[0].input_tokens} in + {estimates[0].output_tokens} out "
+          f"= ${estimates[0].total_usd:.4f}")
+    budget = CostBudget(
+        max_input_tokens=50_000,
+        max_output_tokens=20_000,
+        max_calls=10,
+        max_total_usd=0.50,
+    )
+    print(f"[consult] calling {client.name}/{client.model} ...")
+    responses = consult(
+        [client], question, budget,
+        table=table, project=project, original_ask=ORIGINAL_ASK,
+    )
+    if not responses or responses[0].error:
+        err = responses[0].error if responses else "no response"
+        print(f"[error] {err}", file=sys.stderr)
+        return 1
+    r = responses[0]
+    actual = estimate_cost(r.provider, r.model, r.input_tokens, r.output_tokens, table)
+    actual_usd = actual.total_usd
+    print(f"[done] tokens: {r.input_tokens} in / {r.output_tokens} out · "
+          f"latency: {r.latency_ms} ms · actual ${actual_usd:.4f}")
+    manifest = SessionManifest(
+        mode="roadmap",
+        artefact=str(ROADMAP_PATH.relative_to(REPO_ROOT)),
+        original_ask=ORIGINAL_ASK,
+        members=[f"{r.provider}/{r.model}"],
+        rounds=1,
+        cost_usd_estimated=estimates[0].total_usd,
+        cost_usd_actual=actual_usd,
+        extra={"purpose": "Phase 1 ai-council round-trip + Phase 2a council-modes E2E evidence"},
+    )
+    session_dir = save_session(manifest=manifest, responses=responses)
+    print(f"[saved] {session_dir.relative_to(REPO_ROOT)}/")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

package/scripts/ai_council/budget_guard.py ADDED Viewed

@@ -0,0 +1,172 @@
+"""Per-day rolling cost-budget guard for the council (D3).
+Adds a 24h-rolling-window USD limit on top of the per-session caps in
+`orchestrator.CostBudget`. Persists a small JSONL ledger in
+``~/.config/agent-config/council-spend.jsonl`` (mode 0600, same
+permission discipline as the API keys).
+Contract
+- The ledger is **append-only**. Each line is ``{"ts": ISO-8601 UTC,
+  "usd": float, "provider": str, "model": str}``.
+- ``today_spend_usd()`` sums entries within the last 24h from "now"
+  (true rolling window — not midnight UTC, never resets at boundary
+  surprise).
+- ``would_exceed(limit_usd, next_call_usd)`` returns True iff the next
+  call would push the rolling window past the limit.
+- ``record_spend(usd, provider, model)`` appends a single entry; never
+  raises on disk failure (logs to stderr, returns False).
+The guard is **advisory** to the orchestrator: it provides a check
+function the host agent can call before each council member; the
+orchestrator's per-session cost gate stays the primary defence.
+"""
+from __future__ import annotations
+import datetime as _dt
+import json
+import os
+import stat
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+LEDGER_PATH = Path.home() / ".config" / "agent-config" / "council-spend.jsonl"
+ROLLING_WINDOW_HOURS = 24
+@dataclass
+class SpendEntry:
+    ts: _dt.datetime  # UTC, tz-aware
+    usd: float
+    provider: str
+    model: str
+def _now_utc() -> _dt.datetime:
+    return _dt.datetime.now(_dt.timezone.utc)
+def _ensure_ledger_dir(path: Path) -> bool:
+    """Create the ledger's parent directory mode 0700 if missing."""
+    try:
+        path.parent.mkdir(parents=True, exist_ok=True)
+        if (path.parent.stat().st_mode & 0o777) != 0o700:
+            try:
+                os.chmod(path.parent, 0o700)
+            except OSError:
+                # On macOS ~/.config may inherit umask perms; do not block.
+                pass
+        return True
+    except OSError as exc:  # noqa: BLE001 - never block the orchestrator
+        print(f"[council:budget_guard] mkdir failed: {exc}", file=sys.stderr)
+        return False
+def _ensure_ledger_file_mode(path: Path) -> None:
+    """Make sure an existing ledger file is mode 0600. Best-effort."""
+    if not path.exists():
+        return
+    current = path.stat().st_mode & 0o777
+    if current != 0o600:
+        try:
+            os.chmod(path, 0o600)
+        except OSError:
+            pass
+def _parse_iso(ts: str) -> _dt.datetime | None:
+    try:
+        # `fromisoformat` accepts "+00:00"; we always write with "+00:00".
+        return _dt.datetime.fromisoformat(ts)
+    except ValueError:
+        return None
+def read_entries(path: Path | None = None) -> list[SpendEntry]:
+    """Read every well-formed entry from the ledger.
+    Malformed lines are skipped silently. Empty/missing ledger → [].
+    """
+    p = path or LEDGER_PATH
+    if not p.exists():
+        return []
+    out: list[SpendEntry] = []
+    for line in p.read_text(encoding="utf-8").splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            obj = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        ts = _parse_iso(str(obj.get("ts", "")))
+        if ts is None:
+            continue
+        try:
+            usd = float(obj.get("usd", 0))
+        except (TypeError, ValueError):
+            continue
+        out.append(SpendEntry(
+            ts=ts, usd=usd,
+            provider=str(obj.get("provider", "")),
+            model=str(obj.get("model", "")),
+        ))
+    return out
+def today_spend_usd(
+    *,
+    path: Path | None = None,
+    now: _dt.datetime | None = None,
+    window_hours: int = ROLLING_WINDOW_HOURS,
+) -> float:
+    """Sum of USD spent in the last `window_hours` (rolling window)."""
+    cutoff = (now or _now_utc()) - _dt.timedelta(hours=window_hours)
+    return sum(e.usd for e in read_entries(path) if e.ts >= cutoff)
+def would_exceed(
+    limit_usd: float,
+    next_call_usd: float,
+    *,
+    path: Path | None = None,
+    now: _dt.datetime | None = None,
+    window_hours: int = ROLLING_WINDOW_HOURS,
+) -> bool:
+    """True iff appending `next_call_usd` would push the window past `limit_usd`.
+    `limit_usd <= 0` disables the guard (returns False). Mirrors the
+    `CostBudget.max_total_usd` convention.
+    """
+    if limit_usd <= 0:
+        return False
+    spent = today_spend_usd(path=path, now=now, window_hours=window_hours)
+    return (spent + next_call_usd) > limit_usd
+def record_spend(
+    usd: float,
+    provider: str,
+    model: str,
+    *,
+    path: Path | None = None,
+    now: _dt.datetime | None = None,
+) -> bool:
+    """Append one entry to the ledger. Returns True on success."""
+    if usd <= 0:
+        return True  # zero-cost calls (manual mode) skip the ledger
+    p = path or LEDGER_PATH
+    if not _ensure_ledger_dir(p):
+        return False
+    ts = (now or _now_utc()).isoformat()
+    entry = json.dumps({"ts": ts, "usd": round(usd, 6),
+                        "provider": provider, "model": model}) + "\n"
+    try:
+        with p.open("a", encoding="utf-8") as fh:
+            fh.write(entry)
+    except OSError as exc:  # noqa: BLE001 - never block the orchestrator
+        print(f"[council:budget_guard] write failed: {exc}", file=sys.stderr)
+        return False
+    _ensure_ledger_file_mode(p)
+    return True