@event4u/agent-config 2.19.0 → 2.20.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/.agent-src/commands/agent-status.md +29 -0
  2. package/.agent-src/commands/onboard.md +221 -81
  3. package/.agent-src/packs/README.md +49 -0
  4. package/.agent-src/packs/agency-delivery.yml +63 -0
  5. package/.agent-src/packs/content-engine.yml +53 -0
  6. package/.agent-src/packs/founder-mvp.yml +51 -0
  7. package/.agent-src/presets/README.md +26 -0
  8. package/.agent-src/presets/balanced.yml +34 -0
  9. package/.agent-src/presets/fast.yml +31 -0
  10. package/.agent-src/presets/strict.yml +38 -0
  11. package/.agent-src/profiles/README.md +29 -0
  12. package/.agent-src/profiles/agency.yml +27 -0
  13. package/.agent-src/profiles/content_creator.yml +25 -0
  14. package/.agent-src/profiles/developer.yml +26 -0
  15. package/.agent-src/profiles/finance.yml +24 -0
  16. package/.agent-src/profiles/founder.yml +25 -0
  17. package/.agent-src/profiles/ops.yml +25 -0
  18. package/.agent-src/rules/no-cheap-questions.md +25 -17
  19. package/.agent-src/skills/adr-create/SKILL.md +78 -68
  20. package/.agent-src/skills/subagent-orchestration/SKILL.md +33 -0
  21. package/.agent-src/templates/agents/agent-project-settings.example.yml +1 -1
  22. package/.agent-src/templates/skill-archive-note.md +101 -0
  23. package/.claude-plugin/marketplace.json +1 -1
  24. package/CHANGELOG.md +73 -70
  25. package/README.md +68 -72
  26. package/config/agent-settings.template.yml +22 -0
  27. package/docs/adrs/caveman/0001-default-off-until-bench.md +93 -0
  28. package/docs/adrs/caveman/README.md +9 -0
  29. package/docs/adrs/cost/0001-hard-stop-hook.md +114 -0
  30. package/docs/adrs/cost/README.md +9 -0
  31. package/docs/adrs/memory/0001-consumer-side-snapshot.md +111 -0
  32. package/docs/adrs/memory/README.md +9 -0
  33. package/docs/adrs/router/0001-three-tier-routing.md +119 -0
  34. package/docs/adrs/router/README.md +9 -0
  35. package/docs/adrs/schema/0001-json-schema-frontmatter.md +102 -0
  36. package/docs/adrs/schema/README.md +9 -0
  37. package/docs/adrs/smoke/0001-per-tier-smoke-scripts.md +99 -0
  38. package/docs/adrs/smoke/README.md +9 -0
  39. package/docs/architecture/current-onboard-baseline.md +126 -0
  40. package/docs/architecture/current-safety-behavior.md +137 -0
  41. package/docs/archive/CHANGELOG-pre-2.16.0.md +48 -0
  42. package/docs/archive/CHANGELOG-pre-2.17.0.md +63 -0
  43. package/docs/contracts/adr-layout.md +108 -0
  44. package/docs/contracts/benchmark-corpus-spec.md +97 -0
  45. package/docs/contracts/benchmark-report-schema.md +111 -0
  46. package/docs/contracts/command-clusters.md +1 -0
  47. package/docs/contracts/command-taxonomy.md +137 -0
  48. package/docs/contracts/compression-default-kill-criterion.md +69 -0
  49. package/docs/contracts/config-presets.md +144 -0
  50. package/docs/contracts/cost-dashboard.md +143 -0
  51. package/docs/contracts/cost-enforcement.md +134 -0
  52. package/docs/contracts/file-ownership-matrix.json +0 -7
  53. package/docs/contracts/mcp-tool-inventory.md +53 -0
  54. package/docs/contracts/measurement-baseline.md +102 -0
  55. package/docs/contracts/namespace.md +125 -0
  56. package/docs/contracts/profile-system.md +142 -0
  57. package/docs/contracts/safety-model.md +129 -0
  58. package/docs/contracts/smoke-contracts.md +144 -0
  59. package/docs/contracts/workflow-packs.md +121 -0
  60. package/docs/decisions/ADR-010-profile-pack-preset-boundary.md +132 -0
  61. package/docs/decisions/INDEX.md +1 -0
  62. package/docs/featured-commands.md +27 -0
  63. package/docs/parity/bench-ruflo.json +58 -0
  64. package/docs/parity/bench.json +41 -0
  65. package/docs/parity/ruflo.md +46 -0
  66. package/docs/profiles.md +91 -0
  67. package/package.json +1 -1
  68. package/scripts/_cli/cmd_explain.py +250 -0
  69. package/scripts/_lib/bench_cost.py +138 -0
  70. package/scripts/_lib/bench_quality.py +118 -0
  71. package/scripts/_lib/bench_report.py +150 -0
  72. package/scripts/agent-config +13 -0
  73. package/scripts/audit_adr_coverage.py +175 -0
  74. package/scripts/audit_mcp_tools.py +146 -0
  75. package/scripts/bench_baseline_ready.py +108 -0
  76. package/scripts/bench_drift_check.py +151 -0
  77. package/scripts/bench_per_tool.py +216 -0
  78. package/scripts/bench_run.py +155 -0
  79. package/scripts/config/__init__.py +9 -0
  80. package/scripts/config/presets.py +206 -0
  81. package/scripts/config/profiles.py +173 -0
  82. package/scripts/cost/budget.mjs +73 -12
  83. package/scripts/cost/preflight.mjs +89 -0
  84. package/scripts/lint_archived_skills.py +143 -0
  85. package/scripts/lint_bench_corpus.py +161 -0
  86. package/scripts/lint_namespace.py +135 -0
  87. package/scripts/lint_roadmap_complexity.py +3 -2
  88. package/scripts/skill_overlap.py +204 -0
  89. package/scripts/skill_usage_collect.py +191 -0
  90. package/scripts/skill_usage_report.py +162 -0
  91. package/scripts/smoke/kernel.sh +101 -0
  92. package/scripts/smoke/router.sh +129 -0
  93. package/scripts/smoke/schema.sh +71 -0
  94. package/scripts/smoke/skills.sh +101 -0
@@ -0,0 +1,97 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-14
4
+ ---
5
+
6
+ # Benchmark Corpus Spec — step-4 Phase 1
7
+
8
+ Parser-visible contract for the golden corpus consumed by
9
+ [`scripts/bench_runner.py`](../../scripts/bench_runner.py) and the
10
+ upcoming `scripts/lint_bench_corpus.py`. Defines composition, schema,
11
+ and validation invariants.
12
+
13
+ ## Path decision
14
+
15
+ Roadmap `step-4-measurement-and-benchmark.md`
16
+ Phase 1 Step 2 names `bench/corpus.yaml`. The existing benchmark
17
+ infrastructure (runner + non-dev corpus + `task bench`) lives under
18
+ `tests/eval/` and `scripts/bench_runner.py` hardcodes that directory.
19
+ **Canonical location:** `tests/eval/corpus-<id>.yaml`. The `bench/`
20
+ directory is reserved for **reports + pricing** (Phase 2 deliverables).
21
+ Migration to `bench/corpus.yaml` is a no-op rename if downstream Phase
22
+ 2 work proves the consolidation is worth the diff cost.
23
+
24
+ ## Composition (25 prompts)
25
+
26
+ | Bucket | Count | Purpose |
27
+ |---|---|---|
28
+ | **Routing-canonical** | 10 | One prompt per major skill cluster — exact-match scoring |
29
+ | **Ambiguous** | 8 | Multiple plausible skills — set-intersection ≥ 0.7 scoring |
30
+ | **Destructive / security carve-out** | 5 | Triggers a safety floor — selection must surface the floor skill |
31
+ | **Long-context** | 2 | ≥ 4 k input tokens — exercises retrieval under context pressure |
32
+
33
+ The 10 routing-canonical prompts MUST cover the kernel + tier-1 skill
34
+ clusters used by the dev profile (`developer.yml`). The 8 ambiguous
35
+ prompts MUST each declare ≥ 2 acceptable skills in `expected_skills`.
36
+ The 5 destructive / security prompts MUST declare an
37
+ `expected_carve_outs` value (e.g. `security-sensitive-stop`,
38
+ `non-destructive-by-default`).
39
+
40
+ ## Schema
41
+
42
+ ```yaml
43
+ version: 1 # corpus format version (int)
44
+ corpus_id: <id> # short kebab-case identifier
45
+ selection_accuracy_target: 0.60 # 0.0–1.0; runner exits non-zero below
46
+ prompts:
47
+ - id: <bucket>-<NN> # e.g. canonical-01, ambiguous-03
48
+ category: <bucket> # canonical | ambiguous | destructive | long-context
49
+ user_type_candidates: [<slug>, ...] # optional; informational
50
+ language: en # en | de — per language-and-tone
51
+ prompt: "<text>" # the agent-facing prompt
52
+ expected_skills: [<slug>, ...] # ≥ 1 entry; non-empty
53
+ expected_carve_outs: [<slug>, ...] # required when category == destructive
54
+ rubric: # optional structural assertion
55
+ must_include: ["<phrase>", ...] # all phrases must appear in output
56
+ must_not_include: ["<phrase>", ...]
57
+ length_words: { min: 0, max: 0 }
58
+ quality_assertion: "<regex>" # optional regex over agent output
59
+ ```
60
+
61
+ ### Invariants (lint-bench gate)
62
+
63
+ | Drift | `reason` | Example |
64
+ |---|---|---|
65
+ | Missing top-level `version` / `corpus_id` / `prompts` | `missing_top_level` | — |
66
+ | `version` not in `{1}` | `unsupported_version` | `version: 2` |
67
+ | `selection_accuracy_target` outside `[0.0, 1.0]` | `target_out_of_range` | `1.5` |
68
+ | Duplicate `id` across prompts | `duplicate_id` | two `canonical-01` |
69
+ | `id` does not match `^[a-z][a-z0-9-]*-\d{2}$` | `bad_id_format` | `Canonical_1` |
70
+ | `category` not in `{canonical, ambiguous, destructive, long-context}` | `bad_category` | `category: misc` |
71
+ | `language` not in `{en, de}` | `bad_language` | `language: fr` |
72
+ | `expected_skills` empty / missing | `empty_expected` | `expected_skills: []` |
73
+ | `expected_skills` references an unknown skill slug | `unknown_skill` | `expected_skills: [imaginary]` |
74
+ | `category == destructive` without `expected_carve_outs` | `missing_carve_out` | — |
75
+ | Prompt text empty / whitespace-only | `empty_prompt` | — |
76
+
77
+ The linter MUST run with `--quiet` honour per the script-output
78
+ convention and emit one violation per line in non-quiet mode.
79
+
80
+ ## Composition gates (25-prompt-complete state)
81
+
82
+ Once `corpus-dev.yaml` reaches the 25-prompt target, the linter
83
+ additionally enforces the per-bucket counts above. Until then, the
84
+ linter only enforces per-prompt invariants — partial corpora are
85
+ valid during Phase 1 build-out.
86
+
87
+ The composition gate is opt-in via `--require-full` to keep the
88
+ reduced 10-prompt suite (Phase 1 Step 4) usable during development
89
+ without tripping CI.
90
+
91
+ ## Cross-references
92
+
93
+ - Runner — [`scripts/bench_runner.py`](../../scripts/bench_runner.py)
94
+ - Linter — `scripts/lint_bench_corpus.py` (Phase 1 Step 3)
95
+ - Existing non-dev corpus — [`tests/eval/corpus-non-dev.yaml`](../../tests/eval/corpus-non-dev.yaml)
96
+ - Language gate — [`language-and-tone`](../../.agent-src.uncompressed/rules/language-and-tone.md)
97
+ - Report schema — `docs/contracts/benchmark-report-schema.md` (Phase 2 Step 4)
@@ -0,0 +1,111 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-14
4
+ ---
5
+
6
+ # Benchmark Report Schema — step-4 Phase 2
7
+
8
+ Parser-visible contract for the JSON + Markdown reports emitted by
9
+ [`scripts/bench_run.py`](../../scripts/bench_run.py). Every `task bench`
10
+ run writes one `bench/reports/<ts>-<corpus_id>.json` + matching `.md`.
11
+
12
+ ## File layout
13
+
14
+ ```
15
+ bench/
16
+ ├── pricing.yaml # per-1M model rates + sourced_on dates
17
+ └── reports/
18
+ ├── 2026-05-16T10-30-00Z-dev.json # machine-readable
19
+ ├── 2026-05-16T10-30-00Z-dev.md # human-readable
20
+ └── ...
21
+ ```
22
+
23
+ Filename format: `<UTC ISO-8601 with `:` → `-`>-<corpus_id>.{json,md}`.
24
+ Sortable lexicographically.
25
+
26
+ ## JSON schema (v1)
27
+
28
+ ```yaml
29
+ schema_version: 1
30
+ generated_at: <ISO-8601 UTC>
31
+ corpus:
32
+ id: <corpus_id>
33
+ path: tests/eval/corpus-<id>.yaml
34
+ prompt_count: <int>
35
+ runner:
36
+ bench_run_version: <semver>
37
+ baseline_collector: scripts/bench_runner.py # selection-accuracy floor
38
+ baseline_collector_sha: <git-sha-or-mtime>
39
+ selection:
40
+ top_k: 3
41
+ prompts_hit: <int>
42
+ prompts_total: <int>
43
+ selection_accuracy: <float 0.0-1.0> # hits / total
44
+ target: <float> # from corpus
45
+ passed: <bool> # accuracy >= target
46
+ per_prompt: # one entry per corpus prompt
47
+ - id: canonical-01
48
+ expected_skills: [...]
49
+ top_k_ranked: [...]
50
+ hit: <bool>
51
+ cost:
52
+ source: agents/cost-tracking/sessions.jsonl # or "unavailable"
53
+ sessions_scanned: <int>
54
+ totals:
55
+ input_tokens: <int>
56
+ output_tokens: <int>
57
+ cache_read_input_tokens: <int>
58
+ cache_creation_input_tokens: <int>
59
+ total_cost_usd: <float>
60
+ per_tier: # haiku / sonnet / opus / unknown
61
+ sonnet: { messages: <int>, cost_usd: <float> }
62
+ ...
63
+ pricing_sourced_on: <ISO date from bench/pricing.yaml>
64
+ quality:
65
+ source: <path-or-"not_collected">
66
+ prompts_with_assertion: <int>
67
+ prompts_passing: <int>
68
+ quality_score: <float 0.0-1.0> # passing / total OR 0.0 if not_collected
69
+ per_prompt:
70
+ - id: canonical-01
71
+ assertion: <regex-string>
72
+ assertion_kind: rubric.must_include | quality_assertion
73
+ passed: <bool | "not_collected">
74
+ verdict:
75
+ selection: pass | fail
76
+ quality: pass | fail | not_collected
77
+ overall: pass | fail | partial # partial = quality not_collected
78
+ ```
79
+
80
+ ## Markdown shape
81
+
82
+ Headers in order:
83
+
84
+ 1. `# Benchmark Report — <corpus_id> · <generated_at>`
85
+ 2. `## Headline` — three-line summary (selection · cost · quality).
86
+ 3. `## Selection accuracy` — table per prompt with hit/miss + expected/got.
87
+ 4. `## Cost capture` — per-tier table + total; "unavailable" block if no
88
+ session jsonl was found.
89
+ 5. `## Quality probe` — per-prompt assertion pass/fail; `not_collected`
90
+ block when no agent-output path was passed.
91
+ 6. `## Notes` — pointer to `pricing.yaml`, `corpus path`, and the
92
+ versioned filename for citation.
93
+
94
+ ## Invariants
95
+
96
+ - **No silent drops.** Missing cost source → emit `source: unavailable`
97
+ and `total_cost_usd: 0.0` with a marker; never omit the section.
98
+ - **Quality stub honesty.** When agent outputs are not provided, set
99
+ `quality.source: not_collected` and `verdict.overall: partial`. Score
100
+ stays `0.0`; never inflate by assuming pass.
101
+ - **Pricing dated.** Every cost row reads `sourced_on` from
102
+ `bench/pricing.yaml`. Stale price (> 90 days) → warning line in the
103
+ Markdown footer.
104
+
105
+ ## Cross-references
106
+
107
+ - Runner — [`scripts/bench_run.py`](../../scripts/bench_run.py)
108
+ - Baseline collector — [`scripts/bench_runner.py`](../../scripts/bench_runner.py)
109
+ - Corpus contract — [`benchmark-corpus-spec.md`](benchmark-corpus-spec.md)
110
+ - Pricing source — [`bench/pricing.yaml`](../../bench/pricing.yaml)
111
+ - Cost session reader (live sessions) — [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs)
@@ -297,4 +297,5 @@ A command that fails either floor drops to **Tier-1** at the next minor release;
297
297
  - [`docs/migrations/commands-1.15.0.md`](../migrations/commands-1.15.0.md) — user-facing migration notes.
298
298
  - [`docs/contracts/STABILITY.md`](STABILITY.md) — `beta` level rules apply.
299
299
  - [`docs/contracts/command-surface-tiers.md`](command-surface-tiers.md) — what each tier means and what `--help` surfaces.
300
+ - [`docs/contracts/command-taxonomy.md`](command-taxonomy.md) — profile axis (discoverability) layered on top of this verb axis (invocation).
300
301
  - [`.agent-src.uncompressed/contexts/contracts/artifact-engagement-flow.md`](../../.agent-src.uncompressed/contexts/contracts/artifact-engagement-flow.md) — sibling telemetry surface; same privacy floor and four-layer enforcement model.
@@ -0,0 +1,137 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-12
4
+ ---
5
+
6
+ # Command taxonomy
7
+
8
+ > **Status:** beta — first draft 2026-05-16 (Phase 2 Item 6 of
9
+ > `step-15-product-refinement`).
10
+
11
+ The taxonomy answers **"how is the command surface organized so each
12
+ profile finds their three first commands in under 30 seconds?"** It is
13
+ a **catalog-organization contract**, not an invocation-rename. Existing
14
+ slash invocations (`/work`, `/fix ci`, `/research deep`) are preserved
15
+ by the locked verb-cluster contract at
16
+ [`command-clusters`](command-clusters.md). This file adds a **profile
17
+ axis** on top of the verb axis without breaking either.
18
+
19
+ ## The two axes
20
+
21
+ | Axis | Owner | Surface |
22
+ |---|---|---|
23
+ | **Verb-cluster** (existing) | [`command-clusters`](command-clusters.md) | Defines the invocation tree (`/fix ci` dispatches to the `ci` sub-command of the `fix` cluster). Linter-enforced. **Source of truth for invocation.** |
24
+ | **Profile** (this contract) | [`profile-system`](profile-system.md) | Defines which verb-clusters and sub-commands are surfaced first for each profile (developer · content_creator · founder · agency · finance · ops). **Source of truth for discoverability.** |
25
+
26
+ A command can be discoverable under multiple profiles. `/work` is
27
+ universal — it appears in `commands_hint` for every profile. `/dcf-modeling`
28
+ is finance-only. Discoverability is many-to-many; invocation stays
29
+ single-source.
30
+
31
+ ## Membership rules
32
+
33
+ ### Profile membership
34
+
35
+ A command appears in a profile's `commands_hint` (in
36
+ `.agent-src.uncompressed/profiles/<id>.yml`) iff **all** hold:
37
+
38
+ 1. **First-week reach.** A user of that profile will reach for this
39
+ command within their first five sessions without being told.
40
+ 2. **Profile-coherent.** The command's domain matches the profile's
41
+ primary work surface (engineering for `developer`, content for
42
+ `content_creator`, etc.).
43
+ 3. **Verb-cluster owned.** The command exists in `command-clusters` —
44
+ no profile may declare a command that has not gone through the
45
+ verb-cluster linter.
46
+ 4. **Cap of five.** A profile's `commands_hint` is capped at five
47
+ entries. The cap is what makes "three first commands" possible.
48
+
49
+ ### Top-10 most-used (for alias / deprecation policy)
50
+
51
+ The top-10 list is the **union of all six profiles' `commands_hint`
52
+ lists, ranked by per-profile membership count**. As of 2026-05-16
53
+ that union is, in rank order:
54
+
55
+ 1. `work` (6/6 profiles)
56
+ 2. `implement-ticket` (2/6 — developer, agency)
57
+ 3. `feature` (2/6 — founder, agency)
58
+ 4. `council` (2/6 — founder, finance)
59
+ 5. `challenge-me` (2/6 — founder, finance)
60
+ 6. `review-changes` (2/6 — developer, ops)
61
+ 7. `fix` (2/6 — developer, ops)
62
+ 8. `refine-ticket` (1/6 — agency)
63
+ 9. `commit` (1/6 — developer)
64
+ 10. `roadmap` (1/6 — agency)
65
+
66
+ The top-10 is regenerated automatically from the profile YAMLs by
67
+ `scripts/regen_top10.py` (Phase 2 deliverable — not yet shipped). Until
68
+ the regen script lands, the list above is the locked snapshot.
69
+
70
+ ## Backward-compat policy
71
+
72
+ The top-10 commands carry a **two-release backward-compat guarantee**:
73
+
74
+ - A rename of any top-10 command (whether by verb-cluster restructure
75
+ or profile-axis reorganization) ships with an alias for **at least
76
+ two minor releases**.
77
+ - The alias is recorded in the verb-cluster's `Replaces` column in
78
+ [`command-clusters`](command-clusters.md) and re-emits a one-line
79
+ deprecation notice to stderr on every invocation.
80
+ - Removing the alias requires the `bundled-always-rules-acknowledged`
81
+ PR label and an entry in the CHANGELOG `Removed` section naming the
82
+ end-of-deprecation release.
83
+
84
+ Commands outside the top-10 follow the existing verb-cluster
85
+ deprecation rules (one release as a shim, then disappear).
86
+
87
+ ## Discoverability surfaces
88
+
89
+ Three surfaces consume this contract:
90
+
91
+ | Surface | Path | What it shows |
92
+ |---|---|---|
93
+ | **README** | `README.md` § "Six entry paths" | Per-profile `commands_hint` (max 5) rendered as the first-commands list per profile block |
94
+ | **Catalog** | `docs/catalog.md` | All commands grouped by verb-cluster (primary axis), with a per-command `profiles:` line listing which profiles surface it |
95
+ | **Wizard** | `.agent-src.uncompressed/commands/onboard.md` | After role selection, prints the five-command starter list from the selected profile's `commands_hint` |
96
+
97
+ The README and wizard surfaces are already wired. The catalog `profiles:`
98
+ line is a Phase 2 deliverable.
99
+
100
+ ## What this contract does **not** do
101
+
102
+ - **Does not** rename any command. Invocation stays flat (`/work`, not
103
+ `/dev/work`). The `/dev/...` / `/ops/...` strawman in the Item 6
104
+ roadmap entry is **rejected** — adding a profile prefix to invocation
105
+ would dual-namespace the surface, conflict with verb-cluster cluster
106
+ heads, and require a 124-command migration with no measurable
107
+ discoverability gain over the README + wizard surfaces above.
108
+ - **Does not** modify the verb-cluster contract. `command-clusters`
109
+ remains the locked source of truth for invocation. This contract is
110
+ additive.
111
+ - **Does not** ship telemetry. The top-10 is derived from declared
112
+ profile membership, not observed usage. A usage-based top-10
113
+ recomputation is deferred to Item 10 (Cost Governance Dashboard),
114
+ which already collects per-command call counts.
115
+
116
+ ## Open questions (post-beta)
117
+
118
+ 1. **Profile evolution.** When a seventh profile lands (e.g.
119
+ `researcher`), what is the membership review process for the
120
+ top-10? Proposal: any new profile triggers a `regen_top10.py` run
121
+ and a CHANGELOG entry; no manual review unless the top-10 order
122
+ changes.
123
+ 2. **Profile-prefix invocation.** If the no-rename verdict is
124
+ revisited (e.g. user research shows discoverability still fails
125
+ even with the README + wizard surfaces), a separate ADR records
126
+ the decision; this contract does not pre-authorize it.
127
+ 3. **Catalog generator.** `docs/catalog.md` is currently
128
+ handwritten. The `profiles:` line proposed in the discoverability
129
+ table requires `scripts/regen_catalog.py` to consume profile YAMLs
130
+ — deferred to its own roadmap step.
131
+
132
+ ## See also
133
+
134
+ - [`command-clusters`](command-clusters.md) — verb-axis (invocation)
135
+ - [`profile-system`](profile-system.md) — profile-axis (discoverability)
136
+ - [`command-surface-tiers`](command-surface-tiers.md) — tier-axis (`./agent-config --help` visibility)
137
+ - `step-15-product-refinement` § Phase 2 Item 6
@@ -0,0 +1,69 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-14
4
+ ---
5
+
6
+ # Compression default — kill-criterion
7
+
8
+ > **Status:** parked, criterion-deferred · **Owner:** `step-4-measurement-and-benchmark.md`
9
+ > closeout phase · **Source:** [`council-synthesis.md` § 7](../../agents/audit-2026-05-14-north-star/council-synthesis.md)
10
+
11
+ ## Rule
12
+
13
+ ```
14
+ DEFAULT STAYS OFF UNTIL `task bench` PRODUCES A NUMBER.
15
+ DECISION OWNED BY step-4 CLOSEOUT, NOT BY THIS DOC OR BY step-99.
16
+ ```
17
+
18
+ 1. **Current state.** `caveman.speak_scope` defaults `off`. Carve-outs
19
+ (security · destructive · multi-step · code blocks · paths · numbered
20
+ options · Iron-Law markers) are documented in
21
+ [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
22
+ but the feature is non-promoted: no skill recommends turning it on,
23
+ no preset enables it, no profile depends on it.
24
+ 2. **Baseline window.** 60 days from the first green run of
25
+ `task bench` against the locked 25-prompt corpus
26
+ (`step-4-measurement-and-benchmark.md`
27
+ Phase 2). The corpus, the model, and the cost-tracker are frozen
28
+ for the window; mid-window changes restart the clock.
29
+ 3. **Decision points.** After the window closes, `step-4` closeout
30
+ reads `docs/parity/bench.json` and applies exactly one of:
31
+
32
+ | Measured tokens saved | Quality regression on corpus | Verdict |
33
+ |---|---|---|
34
+ | < 30 % | any | **Deprecate** — remove `caveman-speak` rule, archive `caveman-compress` script, retire `caveman.*` settings keys with a one-release deprecation window |
35
+ | ≥ 30 % | < 5 % | **Flip default on** — `caveman.speak_scope` defaults to a non-`off` value, carve-outs stay, statusline surfaces lifetime tokens saved |
36
+ | ≥ 30 % | ≥ 5 % | **Hold** — repeat the window once with tuned intensity ladder; second hold → deprecate |
37
+
38
+ "Quality regression" = host-side rubric on the corpus per
39
+ `step-4-measurement-and-benchmark.md` Phase 3. Numbers checked into
40
+ `docs/parity/bench.json` as the decision artefact.
41
+ 4. **No interim flip.** The default does not move on anecdote,
42
+ gut feeling, or a single benchmark snapshot. The 60-day window and
43
+ the table above are the only path to a default change.
44
+
45
+ ## Why this is parked, not decided
46
+
47
+ The council split (Opus = remove now, o1 = measure-then-decide) is
48
+ real. Either branch is wrong-shaped without numbers. The kill-criterion
49
+ gives the audit a deterministic resolution path and stops every
50
+ downstream roadmap from re-litigating compression on every PR.
51
+
52
+ ## Cross-references
53
+
54
+ - ``step-99-north-star-restructure.md` § Phase 4`
55
+ — parks this criterion, does not decide.
56
+ - `step-4-measurement-and-benchmark.md`
57
+ — owns `task bench`, the corpus, and the closeout that applies the
58
+ table above.
59
+ - `step-10-caveman-parity.md`
60
+ — implements the carve-outs and the statusline integration the
61
+ "flip default on" branch depends on; blocks the default flip until
62
+ acceptance is green.
63
+ - [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
64
+ — runtime rule; reads `caveman.speak_scope` from settings.
65
+
66
+ ## Done
67
+
68
+ This doc exists to keep the decision visible. It is **not** an action
69
+ item. `step-4` closeout closes the loop.
@@ -0,0 +1,144 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-14
4
+ ---
5
+
6
+ # Config Presets — Contract
7
+
8
+ > **Status:** beta · **Owner:** package maintainer · **Last reviewed:** 2026-05-16
9
+ >
10
+ > Schema and semantics for the **Config Preset** axis introduced in
11
+ > step-15 Phase 1 item 4. Records the **Cost Enforcement** model
12
+ > (Council v3 action #3 prerequisite) so the preset loader can ship.
13
+ > Boundary against `profile.id`, `pack.id`, and `cost_profile`:
14
+ > [`ADR-010`](../decisions/ADR-010-profile-pack-preset-boundary.md).
15
+
16
+ ## Decision
17
+
18
+ A **preset** owns governance knobs that the user wants to tune as a
19
+ bundle, not individually. Three seed presets ship; users can declare
20
+ their own under `.agent-src.uncompressed/presets/<id>.yml`.
21
+
22
+ | `preset.id` | Stance | Typical user |
23
+ |---|---|---|
24
+ | `fast` | Lowest friction; widest autonomy; loosest cost caps | Solo founder, throw-away prototype, exploration |
25
+ | **`balanced`** *(default)* | Moderate friction; per-task autonomy; sensible cost caps | Day-to-day work; default for any new install |
26
+ | `strict` | Highest friction; ask-by-default; tight cost caps; block-on-risk | Production paths, regulated work, shared trunks |
27
+
28
+ Profile-aware overlay: `developer + strict` ≠ `founder + strict` — the
29
+ profile selects which knob in the preset is read first (e.g. `developer`
30
+ reads `block_on_risk.code_paths`, `founder` reads `block_on_risk.financial_paths`).
31
+
32
+ ## Preset shape
33
+
34
+ ```yaml
35
+ preset:
36
+ id: balanced
37
+ autonomy:
38
+ default: auto # on | off | auto (see autonomous-execution rule)
39
+ trivial_suppress: true
40
+ confidence:
41
+ min_band: medium # low | medium | high — block plan if below
42
+ require_evidence: false
43
+ risk:
44
+ block_on: [security, prod_data]
45
+ ask_on: [bulk_delete, schema_change]
46
+ council:
47
+ auto_consult: false
48
+ cap_per_consult_usd: 0.50
49
+ mcp:
50
+ per_call_max_usd: 0.10
51
+ per_session_max_usd: 2.00
52
+ cost:
53
+ daily_max_usd: 10.00
54
+ weekly_max_usd: 50.00
55
+ monthly_max_usd: 150.00
56
+ enforce: hybrid # see Cost Enforcement section
57
+ notifications:
58
+ threshold_pct: [50, 75, 90, 100]
59
+ ```
60
+
61
+ ## Cost Enforcement
62
+
63
+ *Hybrid model* — recorded as the Phase 1 prerequisite per Council v3
64
+ action #3. Two enforcement surfaces, one decision per call.
65
+
66
+ ### Hard enforcement (preset loader, blocking)
67
+
68
+ The preset loader **refuses to dispatch** any council or MCP call whose
69
+ *estimated* cost exceeds the active preset's per-call ceiling. The
70
+ estimate is read from the model adapter (`council_cli.py estimate` for
71
+ council; the MCP tool manifest for MCP). The block is raised **before**
72
+ the network call. There is no override flag — the user must change the
73
+ preset, override `cost.per_call_max_usd` in `.agent-settings.yml`, or
74
+ pass `--preset=fast` on the CLI.
75
+
76
+ ```
77
+ PRE-CALL CEILING IS HARD.
78
+ NO RUNTIME OVERRIDE. NO "JUST THIS ONCE" FLAG.
79
+ EXCEED → REFUSE → SURFACE THE CEILING + THE OVERRIDE PATH.
80
+ ```
81
+
82
+ Applies to:
83
+
84
+ - AI Council consults (`scripts/council_cli.py run`).
85
+ - MCP tool calls dispatched through the universal dispatcher
86
+ ([`hook-architecture-v1`](hook-architecture-v1.md)).
87
+ - Any future skill that reads `preset.cost.per_call_max_usd`.
88
+
89
+ ### Advisory dashboard (retroactive, non-blocking)
90
+
91
+ `agent-config cost` (Phase 2 item 10) surfaces daily / weekly / monthly
92
+ spend against the active preset's caps. The dashboard **does not**
93
+ block — it warns at the thresholds in `preset.notifications.threshold_pct`
94
+ (default `50 / 75 / 90 / 100`). At 100 %, the dashboard prints a hard
95
+ warning; the next session start re-checks the cap against the running
96
+ total before dispatching the next paid call.
97
+
98
+ The advisory layer's role is **awareness**, not enforcement. Enforcement
99
+ is exclusively the per-call ceiling above; retroactive blocking would
100
+ turn a session unrecoverably hostile mid-task.
101
+
102
+ ### What the loader does **not** do
103
+
104
+ - It does **not** estimate cost for unpaid local model calls
105
+ (`ollama`, local llama.cpp). These bypass both surfaces.
106
+ - It does **not** estimate cost for non-LLM tool calls (file reads,
107
+ shell commands, MCP-static-resource fetches). The per-call ceiling
108
+ targets paid token spend.
109
+ - It does **not** override the Hard Floor in
110
+ [`non-destructive-by-default`](../../.agent-src/rules/non-destructive-by-default.md)
111
+ — a preset cannot lift the universal safety floor.
112
+
113
+ ## Resolution chain
114
+
115
+ Reads happen in this order; last writer wins for any single knob:
116
+
117
+ 1. `pack.preset_id` (if pack active) → set `preset.id`.
118
+ 2. `profile.preset_id` → set `preset.id` (if not already set by pack).
119
+ 3. `preset.<id>.yml` → fill all knobs.
120
+ 4. `.agent-settings.yml` user keys under `preset:` → override per-knob.
121
+ 5. Environment variables (`AGENT_CONFIG_PRESET_COST_DAILY_MAX_USD=…`)
122
+ → override per-knob.
123
+ 6. Runtime CLI flags (`--preset-cost-per-call-max-usd=…`) → override
124
+ per-knob, single session.
125
+
126
+ Per [`ADR-010`](../decisions/ADR-010-profile-pack-preset-boundary.md),
127
+ no other axis may write preset-owned knobs.
128
+
129
+ ## Drift detection
130
+
131
+ `task lint-config-schema` (added in Phase 1) hard-fails when:
132
+
133
+ - A pack YAML or profile YAML names a preset-owned knob.
134
+ - A preset YAML names a knob outside this contract.
135
+ - The three seed presets diverge from the documented stances above.
136
+
137
+ ## Non-goals
138
+
139
+ - This contract does **not** define profiles, packs, or `cost_profile`.
140
+ See the corresponding contracts.
141
+ - It does **not** ship a UI. CLI-first (`agent-config cost`,
142
+ `agent-config preset set <id>`).
143
+ - It does **not** auto-migrate existing installs. Without a preset,
144
+ the loader falls back to current per-knob defaults (`balanced`-equivalent).