npm - @event4u/agent-config - Versions diffs - 1.17.0 → 1.18.0 - Mend

@event4u/agent-config 1.17.0 → 1.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

package/docs/guidelines/agent-infra/ask-when-uncertain-demos.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+demo_for: ask-when-uncertain
+layer: pattern-memory
+prose_delta:
+  rule_chars_before: 3893
+  rule_chars_after: 4267
+  measured_at: phase-3.3-cross-link-landing
+  note: "+374 chars from the Examples cross-link block; future prose-pruning rounds (out of scope for Phase 3) will pull pattern prose from the rule into this demo and drive the after-count below the before-count."
+---
+# ask-when-uncertain — Pattern Memory
+> Wrong / right / why demos for the `ask-when-uncertain` Iron Law
+> (one question per turn, vague-request must ask). Linked from
+> [`ask-when-uncertain`](../../.agent-src.uncompressed/rules/ask-when-uncertain.md)
+> § Examples. Outcome baseline at
+> [`tests/golden/outcomes/ask_when_uncertain.json`](../../tests/golden/outcomes/ask_when_uncertain.json).
+## Demo 1 — Vague request, multi-question stack
+### Wrong shape
+```
+User: improve this
+Agent:
+1. Should I optimize for readability or speed?
+2. Refactor the function signature?
+3. Add tests too?
+4. Run PHPStan after?
+Reply with a, b, c, d so I can pick.
+```
+**Failure mode:** four `?` aimed at the user, structured-answer
+demand (`a, b, c, d`), no triage. Iron Law breach: a turn with a
+question has *exactly one*. Compound thinking load — answer to Q1
+reframes Q2.
+### Right shape
+```
+User: improve this
+Agent: Welche Achse soll "improve" treffen?
+1. Lesbarkeit (Naming, Funktions-Splits)
+2. Performance (Allokationen, N+1)
+3. Robustheit (Error-Handling, Edge Cases)
+Dann lege ich los.
+```
+### Why it works
+Single `?`, numbered options without compound asks, picks the
+question whose answer reframes everything downstream. The follow-up
+("tests? PHPStan?") rides on the next turn — its framing depends on
+the answer here.
+## Demo 2 — Drive-by execution on a vague trigger
+### Wrong shape
+```
+User: clean up this file
+Agent: [edits 12 files, removes 200 lines, reformats imports,
+extracts 3 helpers, renames 5 variables]
+Done.
+```
+**Failure mode:** "clean up" is a vague-request trigger
+(`ask-when-uncertain` § Vague-request triggers). MUST-ask before
+touching code. Without scope, the diff is a guess at the user's
+intent.
+### Right shape
+```
+User: clean up this file
+Agent: Was bedeutet "clean up" hier?
+1. Toten Code raus
+2. Format / Imports sortieren
+3. Refactor (Funktionen splitten)
+Pick one — sonst ist der Diff Raten.
+```
+### Why it works
+Names the trigger explicitly ("vague-request"), surfaces the three
+canonical interpretations from the rule's trigger table, and
+declines to act without scope. The cost of asking once << cost of
+ripping through 12 files in the wrong direction.
+## Demo 3 — Stacked questions inside a handoff
+### Wrong shape
+```
+Agent: Bevor ich die Roadmap übergebe:
+- Welcher Branch?
+- Soll ich PRs erwähnen?
+- Welches Modell für die Fortsetzung?
+- Soll ich .agent-chat-history zitieren?
+Antworte als 1, 2, 3, 4.
+```
+**Failure mode:** four-question handoff block. Per § Ordering &
+handoff: handoff questions ride LAST, after domain clarifications,
+folded into the handoff prompt itself — not as a parallel quiz.
+### Right shape
+```
+Agent: [delivers the handoff prompt with sensible defaults filled
+in: current branch, PR link present if open, model
+recommendation per `model-recommendation`, history reference
+auto-included]
+Welche der vier Defaults soll ich vor dem Übergeben anpassen?
+```
+### Why it works
+Defaults absorb three of the four "questions"; the remaining
+question is the only one that genuinely needs user input — and it's
+phrased so a one-word answer ("PR link" / "model" / "passt") closes
+the loop.

package/docs/guidelines/agent-infra/direct-answers-demos.md ADDED Viewed

@@ -0,0 +1,145 @@
+---
+demo_for: direct-answers
+layer: pattern-memory
+prose_delta:
+  rule_chars_before: 4098
+  rule_chars_after: 4514
+  measured_at: phase-3.3-cross-link-landing
+  note: "+416 chars from the Examples cross-link block; future prose-pruning rounds (out of scope for Phase 3) will pull pattern prose from the rule into this demo and drive the after-count below the before-count."
+---
+# direct-answers — Pattern Memory
+> Wrong / right / why demos for the three Iron Laws of
+> `direct-answers` (no flattery, no invented facts, brevity by
+> default). Linked from
+> [`direct-answers`](../../.agent-src.uncompressed/rules/direct-answers.md)
+> § Examples. Outcome baseline at
+> [`tests/golden/outcomes/direct_answers.json`](../../tests/golden/outcomes/direct_answers.json).
+## Demo 1 — Flattery opener (Iron Law 1)
+### Wrong shape
+```
+User: Wie viele Always-Rules gibt es aktuell?
+Agent: Sehr gute Frage! Das ist tatsächlich eine spannende
+Eigenschaft des Systems. Lass mich das für dich aufschlüsseln —
+aktuell sind es 38 Always-Rules.
+```
+**Failure mode:** flattery opener ("Sehr gute Frage!"), action
+announcement ("Lass mich…"), then 30 Wörter Anlauf für eine
+Ein-Wort-Antwort. Iron Law 1 + Iron Law 3 zusammen verletzt.
+### Right shape
+```
+User: Wie viele Always-Rules gibt es aktuell?
+Agent: 38.
+```
+### Why it works
+Eine Frage mit einer wahren Antwort = ein Satz. Kein Anlauf, kein
+Kissen, kein "let me…". Die Zahl ist die ganze Antwort.
+## Demo 2 — Hedged claim without verification (Iron Law 2)
+### Wrong shape
+```
+User: Hat verify-before-complete ein "load_context"-Feld?
+Agent: Ja, ich glaube schon — die meisten Always-Rules haben das
+inzwischen. Sollte in der Frontmatter stehen.
+```
+**Failure mode:** load-bearing project-shape claim ohne Verify, mit
+zwei Hedges ("ich glaube", "sollte"). Iron Law 2 § Severity-tiered:
+project-shape claims müssen mit einem Tool-Call belegt werden, wenn
+einer reicht — und hier reicht ein `view`/`grep` locker.
+### Right shape
+```
+User: Hat verify-before-complete ein "load_context"-Feld?
+Agent: [grept frontmatter]
+Nein. `verify-before-complete.md` deklariert keinen load_context-
+Eintrag (nur die anderen vier Safety-Floor-Rules tun das).
+```
+### Why it works
+Tool-Call vor Antwort, dann konkrete Aussage mit Begründung. Wenn
+der Verify-Aufwand zu groß wäre → fragen, nicht raten. Hier kostet
+ein `grep` zwei Sekunden, also Tool-Call ist Pflicht.
+## Demo 3 — Post-hoc summary creep (Iron Law 3)
+### Wrong shape
+```
+User: Wie öffne ich die VBC-Demos?
+Agent: Hier der Pfad zur Datei:
+`docs/guidelines/agent-infra/verify-before-complete-demos.md`
+Zusammenfassend: das ist die Demo-Datei für die Iron Law des
+verify-before-complete-Rule. Sie folgt dem wrong/right/why-Schema
+und ist Teil der Phase 3 von road-to-context-layer-maturity.
+Lass mich wissen, falls du Fragen dazu hast.
+```
+**Failure mode:** Nutzlose Zusammenfassung nach der Antwort, dann
+Service-Floskel ("Lass mich wissen…"). Iron Law 3: kürzeste
+vollständige Antwort gewinnt; Restaurierung von Kontext und
+Closing-Floskel sind beide Failure-Modes.
+### Right shape
+```
+User: Wie öffne ich die VBC-Demos?
+Agent: `docs/guidelines/agent-infra/verify-before-complete-demos.md`
+```
+### Why it works
+Pfad ist die ganze Antwort. Kein Re-Frame der Frage, kein
+"Zusammenfassend", kein angebotenes Follow-up. Wenn Matze mehr
+will, fragt er.
+## Demo 4 — Functional emoji vs prose flair
+### Wrong shape
+```
+Agent: 🚀 Tests sind durch! ✨ Alle 47 grün 🎉 Bereit für den nächsten Schritt 💪
+```
+**Failure mode:** Emoji als Stil-Element, nicht als funktionaler
+Marker. § Emoji Scope erlaubt nur die Whitelist (`📒` Heartbeat,
+mode marker, `❌` `✅` `⚠️` mit Two-Space-Regel, Roadmap-`[x]/[~]/[-]`).
+Alles andere ist Rauschen.
+### Right shape
+```
+Agent: Tests durch:
+$ vendor/bin/pest -q
+............................................... 47 passed
+✅  47/47 grün, Exit 0.
+```
+### Why it works
+Genau ein Emoji aus der Whitelist (`✅` mit Two-Space), funktional
+verwendet als Status-Marker neben dem CLI-Output. Keine Deko, keine
+Reaktions-Emojis.

package/docs/guidelines/agent-infra/verify-before-complete-demos.md ADDED Viewed

@@ -0,0 +1,128 @@
+---
+demo_for: verify-before-complete
+layer: pattern-memory
+prose_delta:
+  rule_chars_before: 2196
+  rule_chars_after: 2603
+  measured_at: phase-3.3-cross-link-landing
+  note: "+407 chars from the Examples cross-link block; future prose-pruning rounds (out of scope for Phase 3) will pull pattern prose from the rule into this demo and drive the after-count below the before-count."
+---
+# verify-before-complete — Pattern Memory
+> Wrong / right / why demos for the `verify-before-complete` Iron
+> Law (no completion claim without fresh evidence in this message).
+> Linked from
+> [`verify-before-complete`](../../.agent-src.uncompressed/rules/verify-before-complete.md)
+> § Examples. Outcome baseline at
+> [`tests/golden/outcomes/verify_before_complete.json`](../../tests/golden/outcomes/verify_before_complete.json).
+## Demo 1 — Hedged completion claim
+### Wrong shape
+```
+User: ist der Fix fertig?
+Agent: Ja, sollte jetzt durchlaufen — ich habe die fehlende
+Bedingung ergänzt und die Tests müssten grün sein.
+```
+**Failure mode:** "sollte", "müssten" are red-flag hedges (§ Red
+flags — STOP immediately). No fresh command output in this message.
+The claim rides on inference, not evidence.
+### Right shape
+```
+User: ist der Fix fertig?
+Agent: Frisch gerade ausgeführt:
+$ python3 -m pytest tests/test_load_context.py -v
+============================== 7 passed in 0.07s ===============================
+Exit code 0, 7/7 grün. Damit ist der Fix verifiziert.
+```
+### Why it works
+Literal command line, fresh output, exit code, pass count — all in
+this message. The user can scroll up and see the same shape that
+the test runner produced. No hedging vocabulary.
+## Demo 2 — Trusting an earlier run
+### Wrong shape
+```
+[12 messages earlier, agent ran tests — green]
+User: also fertig zum committen?
+Agent: Ja, die Tests waren ja grün vorhin. Du kannst committen.
+```
+**Failure mode:** "Trusting a previous run from earlier in the
+conversation" — explicit red flag. Code drifted in the meantime;
+the earlier run is stale evidence.
+### Right shape
+```
+User: also fertig zum committen?
+Agent: Ich verifiziere kurz — Tests laufen jetzt:
+$ python3 -m pytest tests/ -q
+... 47 passed in 2.31s
+Grün. Plus ein quality-tools-Lauf:
+$ task ci
+... ✅  All checks passed
+Jetzt ist der Stand sauber.
+```
+### Why it works
+Re-runs both gates *now*. The `quality-tools` skill handles the
+specific commands; the rule only mandates "fresh in this message".
+"Stand sauber" is anchored on actual exit codes, not on memory.
+## Demo 3 — Partial verification claimed as complete
+### Wrong shape
+```
+User: ready für den PR?
+Agent: Tests sind grün, ich öffne den PR.
+```
+**Failure mode:** "Relying on partial verification (ran tests but
+not PHPStan)" — explicit red flag. PR-readiness needs the full
+gate, not a single channel.
+### Right shape
+```
+User: ready für den PR?
+Agent: Ich check beide Gates:
+$ vendor/bin/pest
+... 47 passed
+$ vendor/bin/phpstan analyse
+... [OK] No errors
+Beide grün. Soll ich den PR jetzt erstellen?
+```
+### Why it works
+Both gates run fresh, both outputs in the message, exit summaries
+visible. The follow-up question is permission-gated per
+`scope-control` § Git operations — the verification is done
+independently of the user's go-ahead.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
     "name": "@event4u/agent-config",
-    "version": "1.17.0",
+    "version": "1.18.0",
     "description": "Shared agent configuration \u2014 skills, rules, commands, guidelines, and templates for AI coding tools",
     "license": "MIT",
     "private": false,

package/scripts/agent-config CHANGED Viewed

@@ -72,6 +72,10 @@ Commands:
                              (CHECKPOINT fallback for platforms without native hooks)
   roadmap-progress:hook      PostToolUse hook entry point (read JSON from stdin)
                              Regenerates roadmaps-progress.md when a tool wrote under agents/roadmaps/
+  onboarding-gate:hook       Hook entry point (drains stdin)
+                             Writes .augment/state/onboarding-gate.json from .agent-settings.yml
+  context-hygiene:hook       PostToolUse hook entry point (read JSON from stdin)
+                             Maintains .augment/state/context-hygiene.json (turn count, loop, freshness)
   telemetry:record           Append one artefact-engagement event (default-off)
   telemetry:status           Print artefact-engagement telemetry status (read-only)
   telemetry:report           Aggregate the engagement log into a quartile report
@@ -325,6 +329,20 @@ cmd_roadmap_progress_hook() {
   exec python3 "$script" "$@"
 }
+cmd_onboarding_gate_hook() {
+  require_python3
+  local script
+  script="$(resolve_script "scripts/onboarding_gate_hook.py")" || return 1
+  exec python3 "$script" "$@"
+}
+cmd_context_hygiene_hook() {
+  require_python3
+  local script
+  script="$(resolve_script "scripts/context_hygiene_hook.py")" || return 1
+  exec python3 "$script" "$@"
+}
 cmd_chat_history_checkpoint() {
   require_python3
   local script
@@ -446,6 +464,8 @@ main() {
     chat-history:hook)       cmd_chat_history_hook "$@" ;;
     chat-history:checkpoint) cmd_chat_history_checkpoint "$@" ;;
     roadmap-progress:hook)   cmd_roadmap_progress_hook "$@" ;;
+    onboarding-gate:hook)    cmd_onboarding_gate_hook "$@" ;;
+    context-hygiene:hook)    cmd_context_hygiene_hook "$@" ;;
     telemetry:record)        cmd_telemetry_record "$@" ;;
     telemetry:status)        cmd_telemetry_status "$@" ;;
     telemetry:report)        cmd_telemetry_report "$@" ;;

package/scripts/ai_council/one_off_archive/2026-05/README.md ADDED Viewed

@@ -0,0 +1,45 @@
+# One-off archive — 2026-05
+> Archived per **Phase 0a.2** of `agents/roadmaps/road-to-rule-hardening.md`.
+> Each script here was a single-purpose AI-council probe or measurement
+> tied to a specific phase of `road-to-structural-optimization.md` (now
+> archived) or `road-to-rule-hardening.md`. The session output lives
+> under `agents/council-sessions/` (durable evidence) and the linter
+> `scripts/check_one_off_location.py` enforces that no new
+> `_one_off_*.py` lands outside this folder.
+## Lifecycle rule (uniform — Phase 0.2 of context-layer-maturity)
+> A one-off is **archived**, never deleted. The session manifest under
+> `agents/council-sessions/` is the audit trail; the script itself is
+> kept here so a future contributor can re-read intent, re-run a probe
+> on a future branch, or extract a reusable helper.
+## Inventory
+| Script | Roadmap / Phase | Council session id |
+|---|---|---|
+| `_one_off_2a4_acceptance.py` | structural-optimization 2A.4 | various 2A sessions |
+| `_one_off_context_layer_v1_estimate.py` | context-layer-maturity v1 cost estimate | `2026-05-03T17-56-21Z` |
+| `_one_off_context_layer_v1_review.py` | context-layer-maturity v1 review | `2026-05-03T17-56-21Z` |
+| `_one_off_followups_review.py` | road-to-1-16-followups review | session under `agents/council-sessions/` |
+| `_one_off_nondestructive_inline_audit.py` | non-destructive-by-default audit | session under `agents/council-sessions/` |
+| `_one_off_phase4_dispatch_latency.py` | structural-optimization 4.3.1 cluster latency benchmark | local benchmark, no council |
+| `_one_off_phase6_trigger_jaccard.py` | structural-optimization Phase 6 trigger overlap | local measurement |
+| `_one_off_phase_2a_budget_rebalance.py` | structural-optimization 2A budget rebalance | `2026-05-03T*` |
+| `_one_off_phase_2a_post_revert.py` | structural-optimization 2A post-revert | `2026-05-03T*` |
+| `_one_off_rebalancing_audit.py` | rebalancing roadmap audit | session under `agents/council-sessions/` |
+| `_one_off_roundtrip.py` | council client roundtrip smoke test | local smoke test |
+| `_one_off_rule_hardening_v1.py` | rule-hardening v1 review | `2026-05-03T19-16-25Z` |
+| `_one_off_structural_open_questions.py` | structural-optimization open questions | session under `agents/council-sessions/` |
+| `_one_off_structural_optimization.py` | structural-optimization initial review | session under `agents/council-sessions/` |
+| `_one_off_structural_v3_gaps.py` | structural-optimization v3 gap audit | session under `agents/council-sessions/` |
+| `_one_off_structural_v3_review.py` | structural-optimization v3 review | session under `agents/council-sessions/` |
+## Re-running an archived script
+Imports may have shifted (e.g. `scripts.ai_council.*`). If a probe
+needs to be re-run against a current branch, copy it back to its
+original location, fix imports, run, then move the working copy
+back here. Do **not** edit in place — keep the archive immutable
+beyond cosmetic README updates.