npm - @event4u/agent-config - Versions diffs - 1.17.0 → 1.19.0 - Mend

@event4u/agent-config 1.17.0 → 1.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (158) hide show

package/docs/contracts/one-off-script-lifecycle.md ADDED Viewed

@@ -0,0 +1,109 @@
+---
+stability: beta
+---
+# One-off-script lifecycle
+**Purpose.** Pin the naming, location, age, and purge policy for
+**one-off scripts** so the package does not accumulate a graveyard
+under `scripts/`. One-off here means: a script written for a
+specific migration, retrofit, audit, or council run, with no ongoing
+caller and no place in the durable Taskfile.
+**Scope.** Defines the file pattern, the directory, the maximum age,
+the TTL extension mechanism, and the CI purge gate. Does **not**
+specify the content of any specific one-off — that belongs to the
+script itself or the cleanup-mechanics context.
+Last refreshed: 2026-05-04.
+## Naming
+One-off scripts MUST match this regex:
+```
+^_one_off_[a-z0-9-]+\.py$
+```
+The `_one_off_` prefix is the load-bearing signal. Files outside
+this prefix are treated as durable scripts and MUST be referenced by
+the Taskfile or by another script.
+## Location
+```
+scripts/_one_off/<YYYY-MM>/_one_off_<slug>.py
+```
+`<YYYY-MM>` is the UTC month the script was first committed. The
+month directory groups one-offs for archival sweeps. Scripts MUST
+NOT live at `scripts/_one_off/_one_off_*.py` (no month) or under
+`scripts/` directly (no `_one_off/`).
+## TTL
+| State | Action |
+|---|---|
+| Age ≤ 60 days from month-directory date | active, no warning |
+| 60 < Age ≤ 90 days | warning emitted by `lint_one_off_age.py`, no failure |
+| Age > 90 days | `lint_one_off_age.py` fails CI; the script is purged in the next housekeeping pass |
+Age = `today − first-of-month(<YYYY-MM>)` in UTC days. The 60-day
+soft floor and 30-day grace window are intentional — they cover one
+release cycle plus a sprint of grace.
+## TTL extension
+A one-off MAY extend its TTL exactly once, by adding a frontmatter
+block at the top of the script:
+```python
+"""
+---
+ttl_extended_until: 2026-08-31
+ttl_reason: blocked on PROJ-123 — re-runs after cutover
+---
+"""
+```
+The linter respects `ttl_extended_until` if it is ≤ 180 days from
+the file's `<YYYY-MM>` directory date. Beyond 180 days, the linter
+hard-fails — no second extension. The intent is: if a "one-off" is
+still live at six months, it is a durable concern and belongs in
+`scripts/` or a Taskfile group.
+## Purge mechanism
+`lint_one_off_age.py` runs in `task ci`. On a clean working tree, it
+prints purge candidates as a list. Purge itself is a separate human-
+or-CI action — `task purge-one-offs` removes flagged files. The
+linter does not auto-delete.
+## Allowed exceptions
+Two patterns are exempt from the prefix requirement:
+- **Bundler / orchestrator helpers** under `scripts/ai_council/`
+  that exist to support the council CLI — they are not one-offs even
+  though council *runs* are one-offs.
+- **`scripts/_one_off/<YYYY-MM>/README.md`** — a free-form readme is
+  allowed in each month directory documenting why the scripts exist.
+Council run scripts that wrap a question and write the response file
+DO live under `scripts/_one_off/<YYYY-MM>/` and DO follow the prefix
+rule.
+## Cross-references
+- The contract that defines council CLI surface (and so what gets
+  archived as a one-off): the council CLI section of the package's
+  command catalog.
+- The cleanup-mechanics context for housekeeping passes:
+  `agents/contexts/cleanup-mechanics.md`.
+- Linter implementation: `scripts/lint_one_off_age.py`.
+## Stability
+Beta. Breaking changes (e.g. raising the age cap, changing the
+prefix, or removing TTL extensions) require a minor-version bump and
+a `### Breaking` entry in `CHANGELOG.md`.

package/docs/contracts/roadmap-complexity-standard.md ADDED Viewed

@@ -0,0 +1,137 @@
+---
+stability: beta
+---
+# Roadmap Complexity Standard
+> **Audience:** roadmap authors and reviewers in `agents/roadmaps/`.
+> **Linter:** `scripts/lint_roadmap_complexity.py` (run via
+> `task lint-roadmap-complexity`).
+> **Source:** Phase 5 of the `road-to-context-layer-maturity` work
+> (now archived); reviewer-flagged drift after the
+> structural-optimization roadmap proved that "heavy" is correct for
+> structural work but wrong for normal feature work.
+Roadmaps drift toward heavyweight whenever the previous one was
+heavyweight. This contract pins **two tiers**, names exemplars, and
+hard-fails the lint on tier mismatch.
+## Tiers
+### Lightweight (default)
+Almost every roadmap is lightweight. The shape:
+- **≤ 6 phases** total
+- **≤ 1 page per phase** (≈ 60 source lines including header + steps
+  + exit gate; the linter doesn't enforce per-phase line budgets, but
+  reviewers do)
+- **No nested council debates** inside the roadmap (no
+  `## Council Round 1`, `## Council Round 2`, `### Verdict` sections)
+- **No 178-step backlogs** — phases are delivery-shaped, not
+  task-shaped
+- **≤ 600 lines total** (frontmatter + body, blank lines counted)
+- Frontmatter declares `complexity: lightweight`
+**Exemplars:**
+- `road-to-context-layer-maturity.md` (archived) — six phases,
+  ~376 lines, no nested council; the seed roadmap that triggered this
+  standard. Self-tagged as `lightweight`.
+- `road-to-rule-hardening.md` (archived) — five phases, ~263 lines;
+  mechanized the rule layer; sibling of the seed, also lightweight.
+**Typical use:** feature work, follow-ups, bounded refactors,
+mechanization passes, telemetry plumbing.
+### Structural (rare)
+Triggered only when the work changes a contract layer or a budget
+invariant. The shape:
+- Multi-round council, locked decisions, file-ownership matrix,
+  gating contracts on phase boundaries
+- **> 600 lines** total (typical: 800 – 1500)
+- May include `## Council Round N` / `### Verdict` blocks, ADR
+  cross-links, decision matrices
+- Frontmatter declares `complexity: structural`
+**Exemplars:**
+- Archived `agents/roadmaps/_archived/road-to-structural-optimization.md`
+  (closed 2026-05-03 after 6 phases of council-driven budget work).
+  ~1.5k lines, multi-round council, file-ownership matrix.
+**Triggered by:** changes to a public contract surface in
+`docs/contracts/`, a budget invariant in
+[`load-context-budget-model.md`](load-context-budget-model.md), or a
+priority hierarchy in
+[`rule-priority-hierarchy.md`](rule-priority-hierarchy.md).
+**Requires:** explicit user opt-in on creation. The agent must not
+upgrade a lightweight roadmap to structural mid-flight without that
+opt-in.
+## Anti-game clause
+The trigger is **contract-layer change**, not line count alone. A
+heavy roadmap split into two lightweights to dodge the gate is a
+linter-defeat — reviewers flag it on PR review. Conversely, a
+roadmap that legitimately needs the structural shape but tries to
+hide as `lightweight` to skip council overhead is the same defeat in
+the other direction.
+## Frontmatter
+Every roadmap in `agents/roadmaps/` (including draft and archived
+ones) declares its tier:
+```yaml
+---
+complexity: lightweight
+---
+```
+or
+```yaml
+---
+complexity: structural
+status: draft
+---
+```
+Other frontmatter keys (`status:`, `owner:`, `target_release:`) are
+permitted alongside but not required by this contract.
+## Linter contract
+`scripts/lint_roadmap_complexity.py` (≤ 150 LOC, stdlib only)
+enforces the **measurable** subset of this standard:
+| Check | Lightweight | Structural |
+|---|---|---|
+| `complexity:` frontmatter declared | required | required |
+| Total line count | ≤ 600 | > 0 (no upper cap) |
+| `## Phase N` heading count | ≤ 6 | no cap |
+| `## Council Round N` / `### Verdict` blocks | forbidden | allowed |
+| Council session cross-links (`agents/sessions/.../council-…`) | warn | allowed |
+The linter runs on every roadmap file under `agents/roadmaps/` and
+exits non-zero on any violation. Hooked into `task ci` via
+`task lint-roadmap-complexity`.
+**Out of scope for the linter** (reviewer judgment only): step count,
+per-phase length, the contract-layer-change trigger for the
+structural tier.
+## Migration
+Phase 5.3 of `road-to-context-layer-maturity` applies the standard
+retroactively to all open roadmaps in `agents/roadmaps/`: each gets
+a `complexity:` tag based on the rules above. No content rewrites
+land in that step — only the tag.
+Roadmaps that exceed the lightweight cap but plausibly should be
+lightweight (e.g. they accumulated drift) are tagged `structural`
+for now and may be split or trimmed in a follow-up. The migration
+records existing reality, not aspirational reality.

package/docs/contracts/rule-interactions.yml CHANGED Viewed

@@ -221,6 +221,28 @@ pairs:
       - .agent-src.uncompressed/rules/ask-when-uncertain.md#iron-law--one-question-per-turn-always
       - .agent-src.uncompressed/rules/direct-answers.md#iron-law-3--brevity-by-default
+  - id: scope-x-verify-before-complete
+    rules: [verify-before-complete, scope-control]
+    relation: complements
+    conflict: >-
+      Agent has just finished a change that touches user-permission-gated
+      operations (push, branch, PR, tag) and is preparing to claim "done"
+      in the same turn. Both rules can fire: `verify-before-complete`
+      gates the completion claim on fresh evidence; `scope-control`
+      gates the git operation on explicit permission this turn.
+    resolution: >-
+      Both rules apply independently and compose. The
+      `verify-before-complete` Iron Law still requires fresh
+      verification output in this message before any "done" claim,
+      regardless of whether the user has authorised the next git op.
+      Conversely, verification passing does not authorise pushing or
+      merging — those stay behind the `scope-control` permission gate.
+      Skipping either is a rule violation; satisfying one does not
+      satisfy the other.
+    evidence:
+      - .agent-src.uncompressed/rules/verify-before-complete.md#the-iron-law
+      - .agent-src.uncompressed/rules/scope-control.md#git-operations--permission-gated
   - id: language-x-direct-answers
     rules: [language-and-tone, direct-answers]
     relation: complements

package/docs/customization.md CHANGED Viewed

@@ -66,6 +66,7 @@ those sections.
 | `ai_council.cost_budget.max_calls` | `10` | Maximum council members per invocation. |
 | `ai_council.cost_budget.max_total_usd` | `0.0` | Per-invocation USD ceiling. `0` disables (token caps still apply). |
 | `ai_council.cost_budget.daily_limit_usd` | `0.0` | Rolling 24h USD ceiling across all `/council` calls. `0` disables. Ledger lives at `~/.config/agent-config/council-spend.jsonl` (mode 0600). |
+| `ai_council.session_retention_days` | `14` | Auto-prune for `agents/council-sessions/` audit folders. Older session directories are removed on the next `save()`. `0` disables (keep forever). |
 > **Experimental.** AI Council is not yet validated by external users. API costs apply per consultation.

package/docs/development.md CHANGED Viewed

@@ -17,7 +17,10 @@
 ## Task Commands
-All commands use [Task](https://taskfile.dev/). See `Taskfile.yml` for the full list.
+All commands use [Task](https://taskfile.dev/). The root `Taskfile.yml` orchestrates
+`ci`/`_ci-*` and includes the four task groups under `taskfiles/`
+(`ci-fast.yml`, `content.yml`, `engine.yml`, `release.yml`) with `flatten: true`,
+so every task stays in the root namespace. Run `task --list` for the full list.
 ### CI & Verification

package/docs/guidelines/agent-infra/ask-when-uncertain-demos.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+demo_for: ask-when-uncertain
+layer: pattern-memory
+prose_delta:
+  rule_chars_before: 3893
+  rule_chars_after: 4267
+  measured_at: phase-3.3-cross-link-landing
+  note: "+374 chars from the Examples cross-link block; future prose-pruning rounds (out of scope for Phase 3) will pull pattern prose from the rule into this demo and drive the after-count below the before-count."
+---
+# ask-when-uncertain — Pattern Memory
+> Wrong / right / why demos for the `ask-when-uncertain` Iron Law
+> (one question per turn, vague-request must ask). Linked from
+> [`ask-when-uncertain`](../../.agent-src.uncompressed/rules/ask-when-uncertain.md)
+> § Examples. Outcome baseline at
+> [`tests/golden/outcomes/ask_when_uncertain.json`](../../tests/golden/outcomes/ask_when_uncertain.json).
+## Demo 1 — Vague request, multi-question stack
+### Wrong shape
+```
+User: improve this
+Agent:
+1. Should I optimize for readability or speed?
+2. Refactor the function signature?
+3. Add tests too?
+4. Run PHPStan after?
+Reply with a, b, c, d so I can pick.
+```
+**Failure mode:** four `?` aimed at the user, structured-answer
+demand (`a, b, c, d`), no triage. Iron Law breach: a turn with a
+question has *exactly one*. Compound thinking load — answer to Q1
+reframes Q2.
+### Right shape
+```
+User: improve this
+Agent: Welche Achse soll "improve" treffen?
+1. Lesbarkeit (Naming, Funktions-Splits)
+2. Performance (Allokationen, N+1)
+3. Robustheit (Error-Handling, Edge Cases)
+Dann lege ich los.
+```
+### Why it works
+Single `?`, numbered options without compound asks, picks the
+question whose answer reframes everything downstream. The follow-up
+("tests? PHPStan?") rides on the next turn — its framing depends on
+the answer here.
+## Demo 2 — Drive-by execution on a vague trigger
+### Wrong shape
+```
+User: clean up this file
+Agent: [edits 12 files, removes 200 lines, reformats imports,
+extracts 3 helpers, renames 5 variables]
+Done.
+```
+**Failure mode:** "clean up" is a vague-request trigger
+(`ask-when-uncertain` § Vague-request triggers). MUST-ask before
+touching code. Without scope, the diff is a guess at the user's
+intent.
+### Right shape
+```
+User: clean up this file
+Agent: Was bedeutet "clean up" hier?
+1. Toten Code raus
+2. Format / Imports sortieren
+3. Refactor (Funktionen splitten)
+Pick one — sonst ist der Diff Raten.
+```
+### Why it works
+Names the trigger explicitly ("vague-request"), surfaces the three
+canonical interpretations from the rule's trigger table, and
+declines to act without scope. The cost of asking once << cost of
+ripping through 12 files in the wrong direction.
+## Demo 3 — Stacked questions inside a handoff
+### Wrong shape
+```
+Agent: Bevor ich die Roadmap übergebe:
+- Welcher Branch?
+- Soll ich PRs erwähnen?
+- Welches Modell für die Fortsetzung?
+- Soll ich .agent-chat-history zitieren?
+Antworte als 1, 2, 3, 4.
+```
+**Failure mode:** four-question handoff block. Per § Ordering &
+handoff: handoff questions ride LAST, after domain clarifications,
+folded into the handoff prompt itself — not as a parallel quiz.
+### Right shape
+```
+Agent: [delivers the handoff prompt with sensible defaults filled
+in: current branch, PR link present if open, model
+recommendation per `model-recommendation`, history reference
+auto-included]
+Welche der vier Defaults soll ich vor dem Übergeben anpassen?
+```
+### Why it works
+Defaults absorb three of the four "questions"; the remaining
+question is the only one that genuinely needs user input — and it's
+phrased so a one-word answer ("PR link" / "model" / "passt") closes
+the loop.

package/docs/guidelines/agent-infra/direct-answers-demos.md ADDED Viewed

@@ -0,0 +1,145 @@
+---
+demo_for: direct-answers
+layer: pattern-memory
+prose_delta:
+  rule_chars_before: 4098
+  rule_chars_after: 4514
+  measured_at: phase-3.3-cross-link-landing
+  note: "+416 chars from the Examples cross-link block; future prose-pruning rounds (out of scope for Phase 3) will pull pattern prose from the rule into this demo and drive the after-count below the before-count."
+---
+# direct-answers — Pattern Memory
+> Wrong / right / why demos for the three Iron Laws of
+> `direct-answers` (no flattery, no invented facts, brevity by
+> default). Linked from
+> [`direct-answers`](../../.agent-src.uncompressed/rules/direct-answers.md)
+> § Examples. Outcome baseline at
+> [`tests/golden/outcomes/direct_answers.json`](../../tests/golden/outcomes/direct_answers.json).
+## Demo 1 — Flattery opener (Iron Law 1)
+### Wrong shape
+```
+User: Wie viele Always-Rules gibt es aktuell?
+Agent: Sehr gute Frage! Das ist tatsächlich eine spannende
+Eigenschaft des Systems. Lass mich das für dich aufschlüsseln —
+aktuell sind es 38 Always-Rules.
+```
+**Failure mode:** flattery opener ("Sehr gute Frage!"), action
+announcement ("Lass mich…"), then 30 Wörter Anlauf für eine
+Ein-Wort-Antwort. Iron Law 1 + Iron Law 3 zusammen verletzt.
+### Right shape
+```
+User: Wie viele Always-Rules gibt es aktuell?
+Agent: 38.
+```
+### Why it works
+Eine Frage mit einer wahren Antwort = ein Satz. Kein Anlauf, kein
+Kissen, kein "let me…". Die Zahl ist die ganze Antwort.
+## Demo 2 — Hedged claim without verification (Iron Law 2)
+### Wrong shape
+```
+User: Hat verify-before-complete ein "load_context"-Feld?
+Agent: Ja, ich glaube schon — die meisten Always-Rules haben das
+inzwischen. Sollte in der Frontmatter stehen.
+```
+**Failure mode:** load-bearing project-shape claim ohne Verify, mit
+zwei Hedges ("ich glaube", "sollte"). Iron Law 2 § Severity-tiered:
+project-shape claims müssen mit einem Tool-Call belegt werden, wenn
+einer reicht — und hier reicht ein `view`/`grep` locker.
+### Right shape
+```
+User: Hat verify-before-complete ein "load_context"-Feld?
+Agent: [grept frontmatter]
+Nein. `verify-before-complete.md` deklariert keinen load_context-
+Eintrag (nur die anderen vier Safety-Floor-Rules tun das).
+```
+### Why it works
+Tool-Call vor Antwort, dann konkrete Aussage mit Begründung. Wenn
+der Verify-Aufwand zu groß wäre → fragen, nicht raten. Hier kostet
+ein `grep` zwei Sekunden, also Tool-Call ist Pflicht.
+## Demo 3 — Post-hoc summary creep (Iron Law 3)
+### Wrong shape
+```
+User: Wie öffne ich die VBC-Demos?
+Agent: Hier der Pfad zur Datei:
+`docs/guidelines/agent-infra/verify-before-complete-demos.md`
+Zusammenfassend: das ist die Demo-Datei für die Iron Law des
+verify-before-complete-Rule. Sie folgt dem wrong/right/why-Schema
+und ist Teil der Phase 3 von road-to-context-layer-maturity.
+Lass mich wissen, falls du Fragen dazu hast.
+```
+**Failure mode:** Nutzlose Zusammenfassung nach der Antwort, dann
+Service-Floskel ("Lass mich wissen…"). Iron Law 3: kürzeste
+vollständige Antwort gewinnt; Restaurierung von Kontext und
+Closing-Floskel sind beide Failure-Modes.
+### Right shape
+```
+User: Wie öffne ich die VBC-Demos?
+Agent: `docs/guidelines/agent-infra/verify-before-complete-demos.md`
+```
+### Why it works
+Pfad ist die ganze Antwort. Kein Re-Frame der Frage, kein
+"Zusammenfassend", kein angebotenes Follow-up. Wenn Matze mehr
+will, fragt er.
+## Demo 4 — Functional emoji vs prose flair
+### Wrong shape
+```
+Agent: 🚀 Tests sind durch! ✨ Alle 47 grün 🎉 Bereit für den nächsten Schritt 💪
+```
+**Failure mode:** Emoji als Stil-Element, nicht als funktionaler
+Marker. § Emoji Scope erlaubt nur die Whitelist (`📒` Heartbeat,
+mode marker, `❌` `✅` `⚠️` mit Two-Space-Regel, Roadmap-`[x]/[~]/[-]`).
+Alles andere ist Rauschen.
+### Right shape
+```
+Agent: Tests durch:
+$ vendor/bin/pest -q
+............................................... 47 passed
+✅  47/47 grün, Exit 0.
+```
+### Why it works
+Genau ein Emoji aus der Whitelist (`✅` mit Two-Space), funktional
+verwendet als Status-Marker neben dem CLI-Output. Keine Deko, keine
+Reaktions-Emojis.

package/docs/guidelines/agent-infra/layered-settings.md CHANGED Viewed

@@ -152,27 +152,46 @@ MUST follow these rules. Initial file creation and legacy migration
 are owned by `scripts/install.py`; these rules govern every edit
 after that.
+The contract is **additive merge with user-line preservation** —
+the user's file is the ground truth, the template only contributes
+keys the user is missing. Round-trip parser and merger live in
+[`scripts/sync_yaml_rt.py`](../../scripts/sync_yaml_rt.py); the
+supported YAML subset (block-mappings, scalars, lists, comments,
+CRLF/LF) is documented in its module docstring. The stdlib-only
+choice (vs. `ruamel.yaml`) and its revisit triggers are recorded in
+[`docs/contracts/adr-settings-sync-engine.md`](../../contracts/adr-settings-sync-engine.md).
 For each section in the template
-([`agent-settings.md`](../../templates/agent-settings.md)), walked in
-template order:
+([`agent-settings.md`](../../templates/agent-settings.md)):
-- Keep the section header and its comments verbatim from the template.
 - For each key under the section:
-  - **Key exists in user's file** → use the user's current value.
-  - **Key missing** → use the template default.
-- **Unknown sections/keys** the user has added → preserve at the end
-  of the section (or in a trailing `_user:` block if no matching
-  section exists).
+  - **Key exists in user's file** → keep the user's line **verbatim**
+    (value, quoting, inline comment, indent — all preserved).
+  - **Key missing** → insert the template's line at the position
+    after the user's last preceding sibling that is also in the
+    template (max-index insertion).
+- **Unknown sections/keys** the user has added → preserved verbatim
+  at their existing position. They are not moved to a trailing
+  `_user:` block, not re-prefixed, not flattened.
 Invariants:
-- Template section **order** always wins — reorder existing keys to
-  match.
+- **User order wins.** Template order is only consulted to decide
+  where to insert missing keys; existing user keys are never
+  reordered.
 - Existing scalar values are **never overwritten** unless the user
   asked for that specific change.
-- New keys added to the template land with their default value.
-- Comments from the template replace user comments in the same
-  position — comments are documentation, not user data.
+- New keys added to the template land with their default value and
+  the template's leading comments.
+- **User comments are preserved verbatim** on every existing key.
+  Template comments only land with keys the merger inserts; once a
+  key is in the user's file, its surrounding comments are owned by
+  the user.
+- Legacy `_user._user.foo` corruption (accumulated by older buggy
+  syncs) heals on the next sync — the leading `_user.` chain is
+  stripped and the leaf is re-homed at its template path, or kept
+  as a single-level orphan under `_user:` if no template home
+  exists.
 - Write with 2-space indent, no tabs, no trailing whitespace.
 - Never commit — `.agent-settings.yml` is git-ignored.
 - If a legacy flat `.agent-settings` (key=value) is still present,