@event4u/agent-config 2.19.0 → 2.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/.agent-src/commands/agent-status.md +29 -0
  2. package/.agent-src/commands/onboard.md +221 -81
  3. package/.agent-src/packs/README.md +49 -0
  4. package/.agent-src/packs/agency-delivery.yml +63 -0
  5. package/.agent-src/packs/content-engine.yml +53 -0
  6. package/.agent-src/packs/founder-mvp.yml +51 -0
  7. package/.agent-src/presets/README.md +26 -0
  8. package/.agent-src/presets/balanced.yml +34 -0
  9. package/.agent-src/presets/fast.yml +31 -0
  10. package/.agent-src/presets/strict.yml +38 -0
  11. package/.agent-src/profiles/README.md +29 -0
  12. package/.agent-src/profiles/agency.yml +27 -0
  13. package/.agent-src/profiles/content_creator.yml +25 -0
  14. package/.agent-src/profiles/developer.yml +26 -0
  15. package/.agent-src/profiles/finance.yml +24 -0
  16. package/.agent-src/profiles/founder.yml +25 -0
  17. package/.agent-src/profiles/ops.yml +25 -0
  18. package/.agent-src/rules/no-cheap-questions.md +25 -17
  19. package/.agent-src/skills/adr-create/SKILL.md +78 -68
  20. package/.agent-src/skills/subagent-orchestration/SKILL.md +33 -0
  21. package/.agent-src/templates/agents/agent-project-settings.example.yml +1 -1
  22. package/.agent-src/templates/skill-archive-note.md +101 -0
  23. package/.claude-plugin/marketplace.json +1 -1
  24. package/CHANGELOG.md +52 -30
  25. package/README.md +68 -72
  26. package/config/agent-settings.template.yml +22 -0
  27. package/docs/adrs/caveman/0001-default-off-until-bench.md +93 -0
  28. package/docs/adrs/caveman/README.md +9 -0
  29. package/docs/adrs/cost/0001-hard-stop-hook.md +114 -0
  30. package/docs/adrs/cost/README.md +9 -0
  31. package/docs/adrs/memory/0001-consumer-side-snapshot.md +111 -0
  32. package/docs/adrs/memory/README.md +9 -0
  33. package/docs/adrs/router/0001-three-tier-routing.md +119 -0
  34. package/docs/adrs/router/README.md +9 -0
  35. package/docs/adrs/schema/0001-json-schema-frontmatter.md +102 -0
  36. package/docs/adrs/schema/README.md +9 -0
  37. package/docs/adrs/smoke/0001-per-tier-smoke-scripts.md +99 -0
  38. package/docs/adrs/smoke/README.md +9 -0
  39. package/docs/architecture/current-onboard-baseline.md +126 -0
  40. package/docs/architecture/current-safety-behavior.md +137 -0
  41. package/docs/archive/CHANGELOG-pre-2.16.0.md +48 -0
  42. package/docs/contracts/adr-layout.md +108 -0
  43. package/docs/contracts/benchmark-corpus-spec.md +97 -0
  44. package/docs/contracts/benchmark-report-schema.md +111 -0
  45. package/docs/contracts/command-clusters.md +1 -0
  46. package/docs/contracts/command-taxonomy.md +137 -0
  47. package/docs/contracts/compression-default-kill-criterion.md +69 -0
  48. package/docs/contracts/config-presets.md +144 -0
  49. package/docs/contracts/cost-dashboard.md +143 -0
  50. package/docs/contracts/cost-enforcement.md +134 -0
  51. package/docs/contracts/file-ownership-matrix.json +0 -7
  52. package/docs/contracts/mcp-tool-inventory.md +53 -0
  53. package/docs/contracts/measurement-baseline.md +102 -0
  54. package/docs/contracts/namespace.md +125 -0
  55. package/docs/contracts/profile-system.md +142 -0
  56. package/docs/contracts/safety-model.md +129 -0
  57. package/docs/contracts/smoke-contracts.md +144 -0
  58. package/docs/contracts/workflow-packs.md +121 -0
  59. package/docs/decisions/ADR-010-profile-pack-preset-boundary.md +132 -0
  60. package/docs/decisions/INDEX.md +1 -0
  61. package/docs/featured-commands.md +27 -0
  62. package/docs/parity/bench-ruflo.json +58 -0
  63. package/docs/parity/bench.json +41 -0
  64. package/docs/parity/ruflo.md +46 -0
  65. package/docs/profiles.md +91 -0
  66. package/package.json +1 -1
  67. package/scripts/_cli/cmd_explain.py +250 -0
  68. package/scripts/_lib/bench_cost.py +138 -0
  69. package/scripts/_lib/bench_quality.py +118 -0
  70. package/scripts/_lib/bench_report.py +150 -0
  71. package/scripts/agent-config +13 -0
  72. package/scripts/audit_adr_coverage.py +175 -0
  73. package/scripts/audit_mcp_tools.py +146 -0
  74. package/scripts/bench_baseline_ready.py +108 -0
  75. package/scripts/bench_drift_check.py +151 -0
  76. package/scripts/bench_per_tool.py +216 -0
  77. package/scripts/bench_run.py +155 -0
  78. package/scripts/config/__init__.py +9 -0
  79. package/scripts/config/presets.py +206 -0
  80. package/scripts/config/profiles.py +173 -0
  81. package/scripts/cost/budget.mjs +73 -12
  82. package/scripts/cost/preflight.mjs +89 -0
  83. package/scripts/lint_archived_skills.py +143 -0
  84. package/scripts/lint_bench_corpus.py +161 -0
  85. package/scripts/lint_namespace.py +135 -0
  86. package/scripts/skill_overlap.py +204 -0
  87. package/scripts/skill_usage_collect.py +191 -0
  88. package/scripts/skill_usage_report.py +162 -0
  89. package/scripts/smoke/kernel.sh +101 -0
  90. package/scripts/smoke/router.sh +129 -0
  91. package/scripts/smoke/schema.sh +71 -0
  92. package/scripts/smoke/skills.sh +101 -0
@@ -0,0 +1,126 @@
1
+ # Current `/onboard` Baseline (pre-step-15)
2
+
3
+ > **Status:** descriptive baseline · **Owner:** package maintainer ·
4
+ > **Last reviewed:** 2026-05-16
5
+ >
6
+ > Documents the **current** `/onboard` flow so the Phase 1 Guided
7
+ > Setup Wizard (step-15 item 2) has a baseline to extend. Council v3
8
+ > unique finding (cannot "extend" an undocumented surface). This file
9
+ > describes what ships today; it is **not** a proposal.
10
+
11
+ ## Surface
12
+
13
+ `/onboard` lives at [`.agent-src.uncompressed/commands/onboard.md`](../../.agent-src.uncompressed/commands/onboard.md)
14
+ (canonical source) and is triggered by the
15
+ [`onboarding-gate`](../../.agent-src/rules/onboarding-gate.md) rule on
16
+ the first turn when `onboarding.onboarded == false` in
17
+ `.agent-settings.yml`. Cloud surfaces (Claude.ai Web, Skills API): fully
18
+ inert — no settings file, no flow.
19
+
20
+ ## The 12 steps today
21
+
22
+ | # | Step | Captures | Asked if |
23
+ |---|---|---|---|
24
+ | 1 | Greet + set expectations | — | always |
25
+ | 2 | Offer user-global cross-project defaults | intent flag for step 9 | first-time-setup heuristic only |
26
+ | 3 | `personal.user_name` | first name | unset |
27
+ | 4 | `personal.ide` (+ auto-detect via `ps aux`) and `personal.open_edited_files` | IDE id, auto-open flag | unset |
28
+ | 5 | `personal.pr_comment_bot_icon` | bool | always (no detection possible) |
29
+ | 6 | `personal.rtk_installed` (via `which rtk`) | bool + install action | rtk not found |
30
+ | 7 | `cost_profile` and `pipelines.skill_improvement` | profile id, learning bool | always (one summary screen) |
31
+ | 8 | Mark `onboarding.onboarded: true` | — | always |
32
+ | 9 | Write user-global `~/.event4u/agent-config/agent-settings.yml` | six whitelisted keys | step 2 captured "yes" |
33
+ | 10 | Summary block | — | always |
34
+ | 11 | Quickstart pointer (`/work` and `/implement-ticket`) | — | local only |
35
+ | 12 | Maintainer telemetry hint (opt-in) | — | local only |
36
+
37
+ ## What `/onboard` does **not** capture today
38
+
39
+ Step-15 Phase 1 item 2 introduces a new role-selection step ("8 options
40
+ covering Software / Content / Founder / Consulting / Marketing / Finance
41
+ / Handwerk / Self-configure") that produces a `user_type`. Today, no
42
+ `user_type` is captured. Specifically:
43
+
44
+ - **No audience/role question.** `/onboard` knows the developer's name,
45
+ IDE, and rtk install status — never the audience taxonomy.
46
+ - **No `profile.id`.** `profile.id` does not exist as a key in
47
+ `.agent-settings.yml`. Per
48
+ [ADR-010](../decisions/ADR-010-profile-pack-preset-boundary.md), it
49
+ is owned by the Phase 1 item 1 profile loader.
50
+ - **No `preset.id`.** Same status — `preset.id` arrives with Phase 1
51
+ item 4.
52
+ - **No `pack.id`.** Arrives with Phase 2 item 7.
53
+ - **No risk-appetite question.** The current flow defers risk posture
54
+ to `personal.autonomy`, which is itself not part of the onboard
55
+ questions (it inherits the template default).
56
+ - **No stack question.** Stack is inferred at runtime by detectors
57
+ (`scripts/detect/*`), not asked here.
58
+
59
+ ## Settings keys written today
60
+
61
+ ```yaml
62
+ personal:
63
+ user_name: "<first-name>" # step 3
64
+ ide: "code|phpstorm|cursor" # step 4
65
+ open_edited_files: true|false # step 4
66
+ pr_comment_bot_icon: true|false # step 5
67
+ rtk_installed: true|false # step 6
68
+ cost_profile: "balanced" # step 7 (default unchanged)
69
+ pipelines:
70
+ skill_improvement: true # step 7 (default unchanged)
71
+ onboarding:
72
+ onboarded: true # step 8
73
+ ```
74
+
75
+ User-global file (step 9, opt-in): the six whitelisted keys in
76
+ [`scripts/_lib/agent_settings.py`](../../scripts/_lib/agent_settings.py)
77
+ — `name`, `ide`, `cost_profile`, `personal.bot_icon`,
78
+ `personal.autonomy`, `caveman.speak_scope`.
79
+
80
+ ## Iron Laws today
81
+
82
+ - **One question per turn** ([`ask-when-uncertain`](../../.agent-src/rules/ask-when-uncertain.md)).
83
+ - **Re-runnable** — invoking `/onboard` when `onboarded: true` walks the
84
+ flow again, never silently rewrites a value (asks before overwriting
85
+ `user_name` / `ide`).
86
+ - **Never commits** — `.agent-settings.yml` is git-ignored.
87
+ - **User-global write is opt-in + one-shot + never silent** — step 2
88
+ captures intent, step 9 re-confirms.
89
+
90
+ ## Gaps the wizard (Phase 1 item 2) must close
91
+
92
+ 1. **Add role-selection step** producing a `user_type` (later mapped to
93
+ `profile.id`). Eight options covering Software / Content / Founder /
94
+ Consulting / Marketing / Finance / Handwerk / Self-configure.
95
+ Inserted **before** step 8 (mark onboarded) so the profile loader
96
+ has a value to read on the next session start.
97
+ 2. **Add stack-detection confirmation step.** Run the existing
98
+ `scripts/detect/*` detectors, present the result, allow the user
99
+ to override. Without confirmation, profile-aware presets cannot
100
+ resolve.
101
+ 3. **Add risk-appetite question.** Maps to `preset.id` from
102
+ [`config-presets.md`](../contracts/config-presets.md). Three
103
+ options: `fast` / `balanced` / `strict`.
104
+ 4. **Write the new keys.** `profile.id`, `preset.id`, optionally
105
+ `pack.id`, plus the user-typed `user_type` as a stable audit field.
106
+
107
+ ## Wizard contract (Phase 1 item 2 acceptance)
108
+
109
+ The wizard MUST:
110
+
111
+ - Preserve every existing step semantically (no silent removal).
112
+ - Insert role + stack + risk-appetite questions **before** step 8.
113
+ - Honor the one-question-per-turn Iron Law.
114
+ - Write `profile.id`, `preset.id`, and `user_type` to
115
+ `.agent-settings.yml` using the section-aware merge rules.
116
+ - Be re-runnable (idempotent for unchanged answers).
117
+ - Work offline (no network call required for any question).
118
+ - Skip itself on cloud surfaces (inherit current cloud-noop behavior).
119
+
120
+ ## See also
121
+
122
+ - [`/onboard` command](../../.agent-src.uncompressed/commands/onboard.md) — canonical source.
123
+ - [`onboarding-gate`](../../.agent-src/rules/onboarding-gate.md) — trigger rule.
124
+ - [`ADR-010`](../decisions/ADR-010-profile-pack-preset-boundary.md) — boundary the wizard must respect.
125
+ - [`config-presets.md`](../contracts/config-presets.md) — preset axis the wizard writes.
126
+ - [`agents/roadmaps/step-15-product-refinement.md`](../../agents/roadmaps/step-15-product-refinement.md) — Phase 1 item 2.
@@ -0,0 +1,137 @@
1
+ # Current Safety Behavior — Baseline (pre-step-15)
2
+
3
+ > **Status:** descriptive baseline · **Owner:** package maintainer ·
4
+ > **Last reviewed:** 2026-05-16
5
+ >
6
+ > Documents the **current** safety / autonomy surface so the Phase 2
7
+ > Universal Safety Model ADR (step-15 item 9) has a baseline to diff
8
+ > against. Council v3 action #4 prerequisite. This file describes what
9
+ > ships today; it is **not** a proposal for what should ship next.
10
+
11
+ ## Scope
12
+
13
+ The current package has **one autonomy switch** plus **four
14
+ non-overridable floors**. The Phase 2 ADR will replace the single switch
15
+ with per-profile, per-domain `deny / ask / allow` declarations. Before
16
+ that ADR can specify "replace X", X has to be written down.
17
+
18
+ ## The one switch — `personal.autonomy`
19
+
20
+ **Where defined:** `.agent-settings.yml` under `personal.autonomy`.
21
+ Template: `config/agent-settings.template.yml`.
22
+
23
+ **Values:** `on` · `off` · `auto`.
24
+
25
+ **Read site:** [`.agent-src/rules/autonomous-execution.md`](../../.agent-src/rules/autonomous-execution.md)
26
+ (Iron-Law rule, kernel-loaded in every profile). Cached on the first
27
+ turn; missing key treated as `on`.
28
+
29
+ **What it gates:** trivial workflow questions (suppression). Examples:
30
+ "Should I run the tests now?", "Should I create the branch?", "Continue
31
+ with the next phase?". These are suppressed when `autonomy` resolves to
32
+ `on`.
33
+
34
+ **What it does NOT gate:** any of the four floors below, any
35
+ [`scope-control`](../../.agent-src/rules/scope-control.md) git operation,
36
+ or any [`commit-policy`](../../.agent-src/rules/commit-policy.md) commit
37
+ default. The switch only narrows the **trivial-question** surface.
38
+
39
+ ### State table
40
+
41
+ | State | Behavior on trivial workflow questions | Blocking / Hard-Floor / Commit gates |
42
+ |---|---|---|
43
+ | `on` | **Suppress** — agent acts, surfaces what it did | Unchanged — still apply |
44
+ | `off` | **Ask** — numbered options, single question | Unchanged — still apply |
45
+ | `auto` | Same as `off` until the user opts in via a standing autonomy directive ("just work", "arbeite eigenständig"). Then sticky-flip to `on` for the rest of the conversation. Mirror opt-out flips back. | Unchanged — still apply |
46
+
47
+ ### Opt-in detection
48
+
49
+ Intent-matched, not literal-string-matched. Speech-act-checked: the
50
+ phrase must be a meta-instruction, not content / quote / code. Detail:
51
+ [`autonomy-detection`](../../.agent-src/contexts/execution/autonomy-detection.md),
52
+ [`autonomy-mechanics`](../../.agent-src/contexts/execution/autonomy-mechanics.md).
53
+
54
+ ### Task scope vs conversation scope
55
+
56
+ Two distinct autonomy shapes:
57
+
58
+ | Shape | Trigger | Scope |
59
+ |---|---|---|
60
+ | **Conversation-wide trivial-question suppression** | "stop asking on trivial steps" — no deliverable named | Sticky for the rest of the conversation. Suppresses trivial workflow questions only. |
61
+ | **Task-scoped autonomous execution** | "work autonomously on X", "arbeite die Roadmap Y komplett ab" — deliverable named | Bound to that task. Ends when the task ends. Does NOT authorize a different later deliverable. |
62
+
63
+ Per [`autonomous-execution § task-scope`](../../.agent-src/rules/autonomous-execution.md#task-scope--autonomy-is-bound-to-the-named-task).
64
+
65
+ ## The four non-overridable floors
66
+
67
+ No value of `personal.autonomy` lifts any of these. Standing
68
+ autonomy directives, roadmap authorizations, or "just keep going"
69
+ phrases never reach them.
70
+
71
+ ### 1. Hard Floor — `non-destructive-by-default`
72
+
73
+ [`.agent-src/rules/non-destructive-by-default.md`](../../.agent-src/rules/non-destructive-by-default.md).
74
+ Stops on: production-branch merges; deploy / release; push to remote;
75
+ production data / infra writes; whimsical bulk deletions; commits
76
+ containing bulk deletions or infra changes. **Always confirm this turn.**
77
+
78
+ ### 2. Git-ops Permission Gate — `scope-control`
79
+
80
+ [`.agent-src/rules/scope-control.md § Git operations`](../../.agent-src/rules/scope-control.md#git-operations--permission-gated).
81
+ Stops on: commit · push · merge · rebase · force-push · branch create /
82
+ switch / delete · PR create / close / retarget · tag / release / pin.
83
+ Permission must be **this turn or a standing instruction not yet
84
+ revoked**.
85
+
86
+ ### 3. Commit Default — `commit-policy`
87
+
88
+ [`.agent-src/rules/commit-policy.md`](../../.agent-src/rules/commit-policy.md).
89
+ **Never commit, never ask about committing.** Four exceptions: user
90
+ says so this turn · standing instruction · `/commit` invoked · roadmap
91
+ authorization. Anything else → no commit.
92
+
93
+ ### 4. Security-sensitive STOP — `security-sensitive-stop`
94
+
95
+ [`.agent-src/rules/security-sensitive-stop.md`](../../.agent-src/rules/security-sensitive-stop.md).
96
+ Stops on: auth, billing, tenant boundaries, secrets, uploads,
97
+ integrations, webhooks, public endpoints. Threat-model **before**
98
+ editing.
99
+
100
+ ## Coverage map
101
+
102
+ | Surface | What governs it |
103
+ |---|---|
104
+ | Trivial workflow question | `personal.autonomy` (the switch) |
105
+ | Blocking architectural / scope question | [`ask-when-uncertain`](../../.agent-src/rules/ask-when-uncertain.md) (always) |
106
+ | Tool / MCP call cost | None today — Phase 1 item 4 introduces preset-loader Hard Enforcement |
107
+ | Skill / command allowlist per audience | None today — Phase 2 item 7 introduces packs |
108
+ | Per-domain `deny / ask / allow` | None today — Phase 2 item 9 introduces this |
109
+ | Hard Floor (prod, deploy, push, bulk-destructive) | Universal — not switchable |
110
+ | Git ops | Universal permission gate — not switchable |
111
+ | Commit | Universal default-deny — not switchable |
112
+
113
+ ## Gaps the Phase 2 ADR will address
114
+
115
+ 1. **One switch, one granularity.** Today, `autonomy: on` suppresses
116
+ *every* trivial question identically. A founder running the
117
+ `content-engine` pack may want autonomy for content, ask-mode for
118
+ spend; the current model cannot express that.
119
+ 2. **No per-domain policy.** Domain-safety rules
120
+ (`.agent-src/rules/domain-safety-*.md`) act as output floors but do
121
+ not declare `deny / ask / allow` per profile. The Phase 2 model
122
+ centralizes this.
123
+ 3. **No machine-readable safety schema.** The current behavior is
124
+ distributed across four rules. A consuming tool (the wizard, the
125
+ explain command) cannot ask "what is this install's safety posture?"
126
+ without reading rule prose.
127
+
128
+ The Phase 2 ADR (`docs/contracts/safety-model.md`) inherits this
129
+ baseline and adds: per-profile policy table, machine-readable schema,
130
+ explain-trace integration. It MUST NOT silently relax any of the four
131
+ floors above.
132
+
133
+ ## See also
134
+
135
+ - [`autonomous-execution`](../../.agent-src/rules/autonomous-execution.md) · [`non-destructive-by-default`](../../.agent-src/rules/non-destructive-by-default.md) · [`scope-control`](../../.agent-src/rules/scope-control.md) · [`commit-policy`](../../.agent-src/rules/commit-policy.md) · [`security-sensitive-stop`](../../.agent-src/rules/security-sensitive-stop.md).
136
+ - [`docs/safety.md`](../safety.md) — domain-safety output floors.
137
+ - [`agents/roadmaps/step-15-product-refinement.md`](../../agents/roadmaps/step-15-product-refinement.md) — Phase 1 item 2a (this doc) and Phase 2 item 9 (Universal Safety Model ADR).
@@ -0,0 +1,48 @@
1
+ # Changelog Archive — pre-2.16.0
2
+
3
+ > Frozen snapshot of `event4u/agent-config` changelog entries from
4
+ > `2.15.0`, split out of the main
5
+ > [`CHANGELOG.md`](../../CHANGELOG.md) on 2026-05-16 once the active
6
+ > era's body crossed the 200-line drift cap enforced by
7
+ > `tests/test_changelog_eras.py`.
8
+ >
9
+ > **Read-only.** New entries land in `CHANGELOG.md` § "Era: 2.16.x".
10
+ > Entries here are not amended — git tag `2.15.0` remains the
11
+ > canonical source for what shipped.
12
+ >
13
+ > Entry shape follows the conventions documented in
14
+ > [`docs/contracts/CHANGELOG-conventions.md`](../contracts/CHANGELOG-conventions.md).
15
+ > Earlier eras live in
16
+ > [`CHANGELOG-pre-2.15.0.md`](CHANGELOG-pre-2.15.0.md),
17
+ > [`CHANGELOG-pre-2.11.0.md`](CHANGELOG-pre-2.11.0.md),
18
+ > [`CHANGELOG-pre-2.7.0.md`](CHANGELOG-pre-2.7.0.md), and
19
+ > [`CHANGELOG-pre-2.2.0.md`](CHANGELOG-pre-2.2.0.md).
20
+
21
+ ## [2.15.0](https://github.com/event4u-app/agent-config/compare/2.14.0...2.15.0) (2026-05-15)
22
+
23
+ ### Features
24
+
25
+ * **agent-user:** add /agents user command cluster (init, show, review, accept, update) ([15d53d8](https://github.com/event4u-app/agent-config/commit/15d53d8d9a2365b044831cd42127e247a70d7e20))
26
+ * **agent-user:** add v1 schema contract for .agent-user.md persona file ([64f4eab](https://github.com/event4u-app/agent-config/commit/64f4eab62ccf6a2606fbca0c56d398372c05a7a0))
27
+
28
+ ### Bug Fixes
29
+
30
+ * **agent-user:** inline council-reference summary per no-roadmap-references ([ee4d3ce](https://github.com/event4u-app/agent-config/commit/ee4d3cedf9f4429450d21ca5badc2ae5c2ecaaed))
31
+ * **agent-user:** drop roadmap references per no-roadmap-references rule ([c8ade8d](https://github.com/event4u-app/agent-config/commit/c8ade8d7c5b495e0e4295aa0cb801e59076ee0b0))
32
+ * **agent-user:** adjust keep-beta-until to fit 90-day window ([801b365](https://github.com/event4u-app/agent-config/commit/801b365117a2d1efb4505e504bdd730e4cbbc217))
33
+
34
+ ### Documentation
35
+
36
+ * **persona:** README section + agent-settings legacy-fallback note ([4da7629](https://github.com/event4u-app/agent-config/commit/4da7629f1f0b5a35a64d0a861040ad8639a66ebe))
37
+ * **roadmap:** mark step-3-agent-user-persona phases as in-progress ([f29d3bc](https://github.com/event4u-app/agent-config/commit/f29d3bce2380c0ea9c67e6094540b88d920ed9ff))
38
+
39
+ ### Chores
40
+
41
+ * **roadmap:** close out + archive step-3-agent-user-persona ([09c0229](https://github.com/event4u-app/agent-config/commit/09c0229efd67af9cad7b2ca8202f4caa351d028d))
42
+ * **ownership:** regenerate file-ownership-matrix for /agents user ([128890d](https://github.com/event4u-app/agent-config/commit/128890d880584704b4842a398555dd979ae54462))
43
+ * **docs:** bump command count from 109 to 115 ([f8c61b1](https://github.com/event4u-app/agent-config/commit/f8c61b1d0ec48034e0d66e8d32534056ca4aa1f0))
44
+ * **template:** bump agent_config_version pin to 2.14.0 ([fcb885f](https://github.com/event4u-app/agent-config/commit/fcb885fd19bdbca46ef91ec4d5e723cc6c186c6d))
45
+ * **index:** regenerate agents/index.md + docs/catalog.md for /agents user ([56b281d](https://github.com/event4u-app/agent-config/commit/56b281d69960d3e57adbd24b9ec6fd24fc1a5aff))
46
+ * **agent-user:** regenerate compressed sources + claude tool stubs ([f79b6d1](https://github.com/event4u-app/agent-config/commit/f79b6d1cfcf1caccde4a723ad779c65d9ed87198))
47
+
48
+ Tests: 4352 (+12 since 2.14.0)
@@ -0,0 +1,108 @@
1
+ ---
2
+ stability: stable
3
+ ---
4
+
5
+ # ADR Layout — Per-area Directories
6
+
7
+ > Status: accepted · 2026-05-16 · Roadmap: `step-11-ruflo-parity` Phase 4
8
+
9
+ ## Scope
10
+
11
+ Two ADR surfaces coexist in this repo. **Both are canonical** — neither supersedes the other.
12
+
13
+ | Surface | Path | Use for |
14
+ |---|---|---|
15
+ | **Flat (legacy)** | `docs/decisions/ADR-NNN-<slug>.md` | Cross-cutting governance decisions: kernel composition, rule taxonomy, package-wide architecture. Numbering is global, sequential, gap-free. |
16
+ | **Per-area** | `docs/adrs/<area>/NNNN-<slug>.md` | Sub-area decisions whose blast radius is one plugin / one subsystem. Numbering is per-area, starts at `0001`, padded to 4 digits. |
17
+
18
+ Choice rule — does the decision constrain code **inside one area folder** (one runtime module, one contract group, one CLI surface)? → per-area. Does it constrain **the package's contract with consumers**? → flat. In doubt → per-area (cheaper to surface, easier to relocate).
19
+
20
+ ## Per-area layout
21
+
22
+ ```
23
+ docs/adrs/
24
+ <area>/
25
+ README.md # one-paragraph area scope + table of all ADRs in this area
26
+ 0001-<slug>.md # first ADR, retrospective or prospective
27
+ 0002-<slug>.md
28
+ ...
29
+ ```
30
+
31
+ `<area>` is a kebab-case stem matching one of:
32
+
33
+ - An entry in the canonical area inventory (see [`scripts/audit_adr_coverage.py`](../../scripts/audit_adr_coverage.py) `AREAS`).
34
+ - A new area added to that inventory in the same PR.
35
+
36
+ Reserved areas (bootstrap pass — step-11 Phase 4 Step 3):
37
+
38
+ | Area | Scope | Owner contract |
39
+ |---|---|---|
40
+ | `cost` | Budget ladder, hard-stop hook, cost reporting | [`cost-enforcement.md`](cost-enforcement.md) |
41
+ | `caveman` | Caveman-speak compression, decompression, reversibility | [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) |
42
+ | `schema` | Frontmatter schemas, v2 rigor, lint behaviour | [`schema-versioning.md`](schema-versioning.md) (when published) |
43
+ | `router` | `router.json` shape, tier semantics, dispatch precedence | [`rule-router.md`](rule-router.md) |
44
+ | `smoke` | Per-tier smoke contracts, baseline locks | [`smoke-contracts.md`](smoke-contracts.md) |
45
+ | `memory` | Memory MCP, propose / promote / poison flow | [`agent-memory-contract.md`](agent-memory-contract.md) |
46
+
47
+ ## Frontmatter
48
+
49
+ Identical across both surfaces:
50
+
51
+ ```yaml
52
+ ---
53
+ adr: NNN # zero-padded; per-area uses 4-digit (0001), flat uses 3-digit (010)
54
+ area: <area> | flat # 'flat' for docs/decisions/, otherwise the area slug
55
+ status: proposed | accepted | superseded | deprecated
56
+ date: YYYY-MM-DD
57
+ decision: <slug>
58
+ supersedes: — | ADR-<area>-NNNN | ADR-MMM
59
+ superseded_by: — | ADR-<area>-NNNN | ADR-MMM
60
+ phase: <roadmap-stem> · <phase-id> # optional but recommended
61
+ type: retrospective | prospective
62
+ ---
63
+ ```
64
+
65
+ Supersession links cross surfaces: a per-area ADR may supersede a flat ADR and vice versa. The numeric prefix in `supersedes:` makes the target unambiguous (`ADR-007` = flat, `ADR-cost-0001` = per-area).
66
+
67
+ ## Per-area README contract
68
+
69
+ Every `<area>/` directory carries a `README.md` with:
70
+
71
+ 1. One-paragraph area scope (≤ 4 sentences).
72
+ 2. Single contract pointer — the `docs/contracts/<X>.md` this area implements (or "no published contract" if pre-Phase 5).
73
+ 3. Numbered table of ADRs in the area: `| # | Title | Status | Date | Supersedes |`. Generated by `scripts/audit_adr_coverage.py --regen-area-readme <area>`.
74
+
75
+ ## Coverage gate
76
+
77
+ `scripts/audit_adr_coverage.py --check` (wired to `task lint-adr-coverage`):
78
+
79
+ - Warns when a `docs/contracts/<X>.md` exists without a matching `docs/adrs/<X>/0001-*.md`.
80
+ - Hard-fails on number gaps within an area (e.g. `0001`, `0003` without `0002`).
81
+ - Hard-fails on missing `README.md` in any non-empty area directory.
82
+ - Warns on dangling `supersedes:` or `superseded_by:` references.
83
+
84
+ Default mode is **warn** at the consumer surface; **fail** under `task ci`. Rationale: a new contract dropped without an ADR is a documentation gap, not a bug. CI enforces it for this package; consumer projects opt in by adding the task to their own pipeline.
85
+
86
+ ## Numbering & gaps
87
+
88
+ - Per-area: 4-digit, gap-free, starts at `0001`. Re-use of numbers is a hard failure in the index regenerator.
89
+ - Flat: 3-digit, gap-free, starts at `001`. Existing ADRs in `docs/decisions/` set the precedent.
90
+ - A deleted ADR is **never** removed from history — supersede it. The lint surfaces broken supersession chains.
91
+
92
+ ## Relationship to `adr-create` skill
93
+
94
+ [`adr-create`](../../.agent-src.uncompressed/skills/adr-create/SKILL.md) accepts an optional `<area>` argument (added in step-11 Phase 4 Step 4):
95
+
96
+ - No `<area>` → flat surface, `docs/decisions/`.
97
+ - `<area>` matches inventory → per-area surface, `docs/adrs/<area>/`.
98
+ - `<area>` does **not** match inventory → skill refuses with a hint to update the inventory first.
99
+
100
+ The skill's template, numbering logic, and validation hooks are identical for both surfaces; only the target directory and number padding differ.
101
+
102
+ ## References
103
+
104
+ - [`docs/adrs/cost/0001-hard-stop-hook.md`](../adrs/cost/0001-hard-stop-hook.md) — first per-area ADR (bootstrap).
105
+ - [`docs/decisions/INDEX.md`](../decisions/INDEX.md) — flat surface index.
106
+ - [`scripts/audit_adr_coverage.py`](../../scripts/audit_adr_coverage.py) — coverage gate.
107
+ - [`scripts/adr/regenerate_index.py`](../../scripts/adr/regenerate_index.py) — index regenerator (works on both surfaces; pass `--dir`).
108
+ - `step-11-ruflo-parity` Phase 4 — origin.
@@ -0,0 +1,97 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-14
4
+ ---
5
+
6
+ # Benchmark Corpus Spec — step-4 Phase 1
7
+
8
+ Parser-visible contract for the golden corpus consumed by
9
+ [`scripts/bench_runner.py`](../../scripts/bench_runner.py) and the
10
+ upcoming `scripts/lint_bench_corpus.py`. Defines composition, schema,
11
+ and validation invariants.
12
+
13
+ ## Path decision
14
+
15
+ Roadmap `step-4-measurement-and-benchmark.md`
16
+ Phase 1 Step 2 names `bench/corpus.yaml`. The existing benchmark
17
+ infrastructure (runner + non-dev corpus + `task bench`) lives under
18
+ `tests/eval/` and `scripts/bench_runner.py` hardcodes that directory.
19
+ **Canonical location:** `tests/eval/corpus-<id>.yaml`. The `bench/`
20
+ directory is reserved for **reports + pricing** (Phase 2 deliverables).
21
+ Migration to `bench/corpus.yaml` is a no-op rename if downstream Phase
22
+ 2 work proves the consolidation is worth the diff cost.
23
+
24
+ ## Composition (25 prompts)
25
+
26
+ | Bucket | Count | Purpose |
27
+ |---|---|---|
28
+ | **Routing-canonical** | 10 | One prompt per major skill cluster — exact-match scoring |
29
+ | **Ambiguous** | 8 | Multiple plausible skills — set-intersection ≥ 0.7 scoring |
30
+ | **Destructive / security carve-out** | 5 | Triggers a safety floor — selection must surface the floor skill |
31
+ | **Long-context** | 2 | ≥ 4 k input tokens — exercises retrieval under context pressure |
32
+
33
+ The 10 routing-canonical prompts MUST cover the kernel + tier-1 skill
34
+ clusters used by the dev profile (`developer.yml`). The 8 ambiguous
35
+ prompts MUST each declare ≥ 2 acceptable skills in `expected_skills`.
36
+ The 5 destructive / security prompts MUST declare an
37
+ `expected_carve_outs` value (e.g. `security-sensitive-stop`,
38
+ `non-destructive-by-default`).
39
+
40
+ ## Schema
41
+
42
+ ```yaml
43
+ version: 1 # corpus format version (int)
44
+ corpus_id: <id> # short kebab-case identifier
45
+ selection_accuracy_target: 0.60 # 0.0–1.0; runner exits non-zero below
46
+ prompts:
47
+ - id: <bucket>-<NN> # e.g. canonical-01, ambiguous-03
48
+ category: <bucket> # canonical | ambiguous | destructive | long-context
49
+ user_type_candidates: [<slug>, ...] # optional; informational
50
+ language: en # en | de — per language-and-tone
51
+ prompt: "<text>" # the agent-facing prompt
52
+ expected_skills: [<slug>, ...] # ≥ 1 entry; non-empty
53
+ expected_carve_outs: [<slug>, ...] # required when category == destructive
54
+ rubric: # optional structural assertion
55
+ must_include: ["<phrase>", ...] # all phrases must appear in output
56
+ must_not_include: ["<phrase>", ...]
57
+ length_words: { min: 0, max: 0 }
58
+ quality_assertion: "<regex>" # optional regex over agent output
59
+ ```
60
+
61
+ ### Invariants (lint-bench gate)
62
+
63
+ | Drift | `reason` | Example |
64
+ |---|---|---|
65
+ | Missing top-level `version` / `corpus_id` / `prompts` | `missing_top_level` | — |
66
+ | `version` not in `{1}` | `unsupported_version` | `version: 2` |
67
+ | `selection_accuracy_target` outside `[0.0, 1.0]` | `target_out_of_range` | `1.5` |
68
+ | Duplicate `id` across prompts | `duplicate_id` | two `canonical-01` |
69
+ | `id` does not match `^[a-z][a-z0-9-]*-\d{2}$` | `bad_id_format` | `Canonical_1` |
70
+ | `category` not in `{canonical, ambiguous, destructive, long-context}` | `bad_category` | `category: misc` |
71
+ | `language` not in `{en, de}` | `bad_language` | `language: fr` |
72
+ | `expected_skills` empty / missing | `empty_expected` | `expected_skills: []` |
73
+ | `expected_skills` references an unknown skill slug | `unknown_skill` | `expected_skills: [imaginary]` |
74
+ | `category == destructive` without `expected_carve_outs` | `missing_carve_out` | — |
75
+ | Prompt text empty / whitespace-only | `empty_prompt` | — |
76
+
77
+ The linter MUST run with `--quiet` honour per the script-output
78
+ convention and emit one violation per line in non-quiet mode.
79
+
80
+ ## Composition gates (25-prompt-complete state)
81
+
82
+ Once `corpus-dev.yaml` reaches the 25-prompt target, the linter
83
+ additionally enforces the per-bucket counts above. Until then, the
84
+ linter only enforces per-prompt invariants — partial corpora are
85
+ valid during Phase 1 build-out.
86
+
87
+ The composition gate is opt-in via `--require-full` to keep the
88
+ reduced 10-prompt suite (Phase 1 Step 4) usable during development
89
+ without tripping CI.
90
+
91
+ ## Cross-references
92
+
93
+ - Runner — [`scripts/bench_runner.py`](../../scripts/bench_runner.py)
94
+ - Linter — `scripts/lint_bench_corpus.py` (Phase 1 Step 3)
95
+ - Existing non-dev corpus — [`tests/eval/corpus-non-dev.yaml`](../../tests/eval/corpus-non-dev.yaml)
96
+ - Language gate — [`language-and-tone`](../../.agent-src.uncompressed/rules/language-and-tone.md)
97
+ - Report schema — `docs/contracts/benchmark-report-schema.md` (Phase 2 Step 4)
@@ -0,0 +1,111 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-14
4
+ ---
5
+
6
+ # Benchmark Report Schema — step-4 Phase 2
7
+
8
+ Parser-visible contract for the JSON + Markdown reports emitted by
9
+ [`scripts/bench_run.py`](../../scripts/bench_run.py). Every `task bench`
10
+ run writes one `bench/reports/<ts>-<corpus_id>.json` + matching `.md`.
11
+
12
+ ## File layout
13
+
14
+ ```
15
+ bench/
16
+ ├── pricing.yaml # per-1M model rates + sourced_on dates
17
+ └── reports/
18
+ ├── 2026-05-16T10-30-00Z-dev.json # machine-readable
19
+ ├── 2026-05-16T10-30-00Z-dev.md # human-readable
20
+ └── ...
21
+ ```
22
+
23
+ Filename format: `<UTC ISO-8601 with `:` → `-`>-<corpus_id>.{json,md}`.
24
+ Sortable lexicographically.
25
+
26
+ ## JSON schema (v1)
27
+
28
+ ```yaml
29
+ schema_version: 1
30
+ generated_at: <ISO-8601 UTC>
31
+ corpus:
32
+ id: <corpus_id>
33
+ path: tests/eval/corpus-<id>.yaml
34
+ prompt_count: <int>
35
+ runner:
36
+ bench_run_version: <semver>
37
+ baseline_collector: scripts/bench_runner.py # selection-accuracy floor
38
+ baseline_collector_sha: <git-sha-or-mtime>
39
+ selection:
40
+ top_k: 3
41
+ prompts_hit: <int>
42
+ prompts_total: <int>
43
+ selection_accuracy: <float 0.0-1.0> # hits / total
44
+ target: <float> # from corpus
45
+ passed: <bool> # accuracy >= target
46
+ per_prompt: # one entry per corpus prompt
47
+ - id: canonical-01
48
+ expected_skills: [...]
49
+ top_k_ranked: [...]
50
+ hit: <bool>
51
+ cost:
52
+ source: agents/cost-tracking/sessions.jsonl # or "unavailable"
53
+ sessions_scanned: <int>
54
+ totals:
55
+ input_tokens: <int>
56
+ output_tokens: <int>
57
+ cache_read_input_tokens: <int>
58
+ cache_creation_input_tokens: <int>
59
+ total_cost_usd: <float>
60
+ per_tier: # haiku / sonnet / opus / unknown
61
+ sonnet: { messages: <int>, cost_usd: <float> }
62
+ ...
63
+ pricing_sourced_on: <ISO date from bench/pricing.yaml>
64
+ quality:
65
+ source: <path-or-"not_collected">
66
+ prompts_with_assertion: <int>
67
+ prompts_passing: <int>
68
+ quality_score: <float 0.0-1.0> # passing / total OR 0.0 if not_collected
69
+ per_prompt:
70
+ - id: canonical-01
71
+ assertion: <regex-string>
72
+ assertion_kind: rubric.must_include | quality_assertion
73
+ passed: <bool | "not_collected">
74
+ verdict:
75
+ selection: pass | fail
76
+ quality: pass | fail | not_collected
77
+ overall: pass | fail | partial # partial = quality not_collected
78
+ ```
79
+
80
+ ## Markdown shape
81
+
82
+ Headers in order:
83
+
84
+ 1. `# Benchmark Report — <corpus_id> · <generated_at>`
85
+ 2. `## Headline` — three-line summary (selection · cost · quality).
86
+ 3. `## Selection accuracy` — table per prompt with hit/miss + expected/got.
87
+ 4. `## Cost capture` — per-tier table + total; "unavailable" block if no
88
+ session jsonl was found.
89
+ 5. `## Quality probe` — per-prompt assertion pass/fail; `not_collected`
90
+ block when no agent-output path was passed.
91
+ 6. `## Notes` — pointer to `pricing.yaml`, `corpus path`, and the
92
+ versioned filename for citation.
93
+
94
+ ## Invariants
95
+
96
+ - **No silent drops.** Missing cost source → emit `source: unavailable`
97
+ and `total_cost_usd: 0.0` with a marker; never omit the section.
98
+ - **Quality stub honesty.** When agent outputs are not provided, set
99
+ `quality.source: not_collected` and `verdict.overall: partial`. Score
100
+ stays `0.0`; never inflate by assuming pass.
101
+ - **Pricing dated.** Every cost row reads `sourced_on` from
102
+ `bench/pricing.yaml`. Stale price (> 90 days) → warning line in the
103
+ Markdown footer.
104
+
105
+ ## Cross-references
106
+
107
+ - Runner — [`scripts/bench_run.py`](../../scripts/bench_run.py)
108
+ - Baseline collector — [`scripts/bench_runner.py`](../../scripts/bench_runner.py)
109
+ - Corpus contract — [`benchmark-corpus-spec.md`](benchmark-corpus-spec.md)
110
+ - Pricing source — [`bench/pricing.yaml`](../../bench/pricing.yaml)
111
+ - Cost session reader (live sessions) — [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs)
@@ -297,4 +297,5 @@ A command that fails either floor drops to **Tier-1** at the next minor release;
297
297
  - [`docs/migrations/commands-1.15.0.md`](../migrations/commands-1.15.0.md) — user-facing migration notes.
298
298
  - [`docs/contracts/STABILITY.md`](STABILITY.md) — `beta` level rules apply.
299
299
  - [`docs/contracts/command-surface-tiers.md`](command-surface-tiers.md) — what each tier means and what `--help` surfaces.
300
+ - [`docs/contracts/command-taxonomy.md`](command-taxonomy.md) — profile axis (discoverability) layered on top of this verb axis (invocation).
300
301
  - [`.agent-src.uncompressed/contexts/contracts/artifact-engagement-flow.md`](../../.agent-src.uncompressed/contexts/contracts/artifact-engagement-flow.md) — sibling telemetry surface; same privacy floor and four-layer enforcement model.