@event4u/agent-config 2.20.1 → 2.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -57,6 +57,22 @@ Extract from latest record:
57
57
  Pricing source: [`bench/pricing.yaml`](../../bench/pricing.yaml). Reader
58
58
  implementation: [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs).
59
59
 
60
+ ### 3b. Read caveman delta + per-conversation cost lens
61
+
62
+ Run two read-only Python helpers (stdlib-only, no-op safe if JSONL missing):
63
+
64
+ - `python3 scripts/caveman_stats.py --format json` — per-session +
65
+ per-conversation + lifetime caveman delta. Honors suspended
66
+ multiplier (see [`docs/contracts/caveman-telemetry.md`](../docs/contracts/caveman-telemetry.md)) — delta reads `0` while suspended; display version + ACTIVE/SUSPENDED state regardless.
67
+ - `python3 scripts/cost_by_conversation.py --format json` — per-conversation
68
+ total cost + model breakdown for current conversation, sourced
69
+ from same `agents/cost-tracking/sessions.jsonl` ledger.
70
+
71
+ Surface in dashboard as one line:
72
+ `[caveman: {lifetime.delta_tokens:+,} tok lifetime · {current_conv.delta_tokens:+,} this conv · multiplier v{multiplier_version} {ACTIVE|SUSPENDED}] · [conv cost: ${current_conv.total_cost_usd:.4f}]`.
73
+
74
+ If both JSONLs missing or empty, omit line silently.
75
+
60
76
  ### 4. Calculate freshness thresholds
61
77
 
62
78
  - **Message threshold**: Next multiple of 25 ≥ current count
@@ -56,6 +56,8 @@ Post-rewrite validator runs on every reply when `speak_scope != off`:
56
56
  The rule documents the algorithm; agents apply it inline before
57
57
  sending. The mechanism is the rule, not a hidden script.
58
58
 
59
+ Optional CI-side regression lock: [`scripts/validate_caveman_carveouts.py`](../../scripts/validate_caveman_carveouts.py) takes pre/post reply pair and asserts byte-identical preservation across all seven carve-out categories — runtime mechanism stays algorithmic; script is offline check.
60
+
59
61
  ## Caveman grammar
60
62
 
61
63
  - Drop articles (`the`, `a`, `an`).
@@ -0,0 +1,119 @@
1
+ ---
2
+ name: compress-memory
3
+ description: "Use when shrinking always-loaded memory files (AGENTS.md, CLAUDE.md, .cursorrules) via caveman grammar — refuses sensitive paths, round-trips via .original.md backup."
4
+ source: package
5
+ domain: process
6
+ execution:
7
+ type: assisted
8
+ handler: internal
9
+ allowed_tools: [Bash]
10
+ ---
11
+
12
+ # compress-memory
13
+
14
+ > **Experimental.** Output-side caveman dialect did not meet kill-criterion in [`bench/reports/caveman-v1.md`](../../../bench/reports/caveman-v1.md) (`vs_terse` median −9.27 %). Input-side memory compression is orthogonal use case: savings target always-loaded memory budget, not reply stream. Treat ship-criterion as **per-target measurement**, not v1 verdict.
15
+
16
+ ## When to use
17
+
18
+ Use when:
19
+
20
+ - Always-loaded memory file (`AGENTS.md`, `CLAUDE.md`, `.cursorrules`, `GEMINI.md`, `.windsurfrules`) close to or above host tool's char budget and maintainer wants to recover input-token headroom.
21
+ - Consumer-shipped `templates/AGENTS.md` failing `agents-md-thin-root` cap and pointer-extraction options exhausted.
22
+ - Maintainer asks to "compress this memory file" or "shrink AGENTS.md" or names input-side caveman.
23
+
24
+ ## Do NOT
25
+
26
+ - Compress reply, commit message, PR body, ticket summary, or any deliverable written *for* human reader — those are carve-outs in [`caveman-speak § Carve-outs`](../../rules/caveman-speak.md) and stay verbatim.
27
+ - Compress path matching sensitive-file denylist (`.env*`, `.netrc`, `credentials*`, `secrets*`, `id_rsa*`, `*.pem|key|p12|pfx|crt|cer|jks`, `.ssh/*`) — script refuses with `SensitivePathError` and so should you.
28
+ - Compress generated file (`.agent-src/`, `.augment/`, `.claude/`, `.cursor/`, `.clinerules/`, `.windsurfrules`) — edit source in `.agent-src.uncompressed/` and regenerate via package's sync + generate-tools scripts (`scripts/compress.sh --sync` + `scripts/compress.py --generate-tools`).
29
+ - Hand-edit compressed memory file in place — run `--decompress` first; next compress pass refuses on body-hash drift (`CompressionRefused`).
30
+ - Commit compressed file without committing matching `.original.md` backup — round-trip breaks otherwise.
31
+
32
+ ## Procedure
33
+
34
+ 1. **Analyse target first.** Before any write, **inspect** target with `view` or `wc -l` to confirm it is always-loaded memory file (`AGENTS.md`, `CLAUDE.md`, `.cursorrules`, `GEMINI.md`, `.windsurfrules`), not generated, and has prose paragraphs to compress (pointer-only Thin-Root file may net near-zero). Skip rest of procedure if any check fails.
35
+ 2. **Check denylist gate.** Run `python3 scripts/compress_memory.py <path> --check` — exit 0 = safe; exit 2 = denylist hit, stop and surface refusal.
36
+ 3. **Record baseline.** `wc -c <path>` — capture pre-compression char count for commit message.
37
+ 4. **Compress.** `python3 scripts/compress_memory.py <path>`. Script writes `<path>.original.md` (verbatim backup) and rewrites `<path>` with `original_sha256:` + `compressed_at:` frontmatter.
38
+ 5. **Inspect diff.** Eyeball every Iron-Law fence, numbered-options block, code fence, backtick span, `❌`/`⚠️`/`✅` line, and frontmatter pair — all must be byte-identical. Body prose may have lost articles (`the`/`a`/`an`) and auxiliaries (`is`/`are`/`was`/`be`/`that`/`which`).
39
+ 6. **Validate idempotency.** Re-run `python3 scripts/compress_memory.py <path>` — clean re-run is no-op (body hash matches). Non-zero exit = stop, escalate.
40
+ 7. **Commit both files together.** `<path>` and `<path>.original.md` ship as pair. Backup is rollback path; never commit one without other.
41
+ 8. **Rollback path.** If readability fails review at step 5: `python3 scripts/compress_memory.py <path> --decompress` restores backup and deletes `.original.md`.
42
+
43
+ ## Output format
44
+
45
+ Maintainer-facing report after invoking script MUST contain, in this order:
46
+
47
+ 1. **Diff line** — pre/post `wc -c` as single line (`AGENTS.md: 2,891 → 2,453 chars (−15.1 %)`).
48
+ 2. **Backup path** — full path of `.original.md` backup so maintainer can verify it landed on disk.
49
+ 3. **Carve-out check** — one line confirming seven carve-out classes round-tripped (`carve-outs: 7 classes preserved · idempotent re-run: clean`).
50
+ 4. **Exit-code surface** — on failure, surface verbatim exit code and exception name (`SensitivePathError → exit 2`, `CompressionRefused → exit 3`, `FileNotFoundError → exit 4`); do not paraphrase.
51
+
52
+ Do **not** narrate algorithm, grammar rules, or carve-out theory — rule and this skill document contract; output reports result.
53
+
54
+ ## Carve-outs — byte-for-byte preserved
55
+
56
+ Mirrors seven carve-out classes in [`caveman-speak`](../../rules/caveman-speak.md). Compression engine in [`scripts/compress_memory.py`](../../../scripts/compress_memory.py) preserves:
57
+
58
+ 1. **Triple-backtick fences** — any language, any depth.
59
+ 2. **Numbered-options lines** — `^>?\s*\d+\.\s` plus `**Recommendation:**` / `**Empfehlung:**` label.
60
+ 3. **Backtick spans** — file paths, command names, identifiers inside body prose.
61
+ 4. **Status / error markers** — lines starting with `❌`, `⚠️`, `✅`.
62
+ 5. **Iron-Law ALL-CAPS lines** — `^[A-Z][A-Z0-9 ,.\-_/']{3,}$`.
63
+ 6. **Frontmatter blocks** — `---` fence pairs at head of file.
64
+ 7. **Mode markers** per [`role-mode-adherence`](../../rules/role-mode-adherence.md).
65
+
66
+ Mangling any of these breaks Iron-Law surface host tool reads. Unit tests in `tests/test_compress_memory.py` lock each carve-out class as regression case.
67
+
68
+ ## Idempotency contract — Step 9 guard
69
+
70
+ Script is **idempotent on clean re-runs**: running it twice on same target is no-op because body hash matches recompressed hash. Script **refuses** on **body drift**:
71
+
72
+ | State | Outcome |
73
+ |---|---|
74
+ | No frontmatter SHA marker | Compress + write backup + inject SHA. |
75
+ | SHA marker present, body re-compresses to same hash | No-op (return target unchanged). |
76
+ | SHA marker present, body hash diverged | **Refuse** with `CompressionRefused` exit 3. |
77
+
78
+ If you need to edit compressed memory file, run `--decompress` first, edit restored `.original.md` content, then re-run compressor. Never hand-edit compressed body — next CI run will either silently corrupt your edit (if it happens to re-compress to same shape) or hard-fail next compress pass.
79
+
80
+ ## Sensitive-path gate
81
+
82
+ Every read path passes through [`scripts/validate_safe_paths.py`](../../../scripts/validate_safe_paths.py) `assert_safe()` before bytes leave disk. Gate is security floor for Phase 2 (input-side compression) per `step-16-caveman-substance.md` Phase 0; rollback of gate is rollback of this skill.
83
+
84
+ CLI exit codes:
85
+
86
+ - `0` — compress / decompress / check succeeded.
87
+ - `2` — `SensitivePathError` (path matched denylist).
88
+ - `3` — `CompressionRefused` (body hash diverged from frontmatter SHA).
89
+ - `4` — `FileNotFoundError` (no `.original.md` backup to restore).
90
+
91
+ ## Gotchas
92
+
93
+ - **Body-hash drift after manual edit** — hand-editing compressed body breaks `original_sha256:` invariant. Next compress pass refuses with `CompressionRefused` (exit 3). Recovery: `--decompress`, edit restored body, re-compress.
94
+ - **`.original.md` backup missing on `--decompress`** — exit 4 (`FileNotFoundError`). Either someone deleted backup or `--decompress` already ran. Restore from git history; never regenerate backup by hand (regenerated content would not be byte-identical).
95
+ - **Denylist false positive** — sensitive-looking filename outside denylist surface (project-specific naming) will still pass `assert_safe()`. Denylist necessary but not sufficient; maintainer responsible for never feeding secrets to compressor.
96
+ - **Frontmatter ordering with existing keys** — if target already has frontmatter, compressor preserves existing keys, drops any prior `original_sha256:` / `compressed_at:` entries, and appends new pair. Other agents reading file should treat SHA + timestamp pair as canonical compression marker, not file size.
97
+ - **Negative savings on pointer-heavy files** — `templates/AGENTS.md` already following Thin-Root (≥ 40 % pointers, ≥ 60-char *why*-clauses) has little prose left to drop; compression may net near-zero or even add bytes via frontmatter. Run [`agents-md-thin-root`](../agents-md-thin-root/SKILL.md) first to maximise pointer share, then measure whether this skill still pays.
98
+ - **Generated-tree drift** — compressing `.agent-src.uncompressed/templates/AGENTS.md` does NOT propagate to `.augment/`, `.claude/`, etc. until package's sync + generate-tools scripts run (`scripts/compress.sh --sync` + `scripts/compress.py --generate-tools`). Always regenerate after compressing templated file.
99
+
100
+ ## Measurement — when to compress
101
+
102
+ No published `caveman-v2` baseline for input-side savings yet (Step 11 of `step-16-caveman-substance.md` ships that). Until then, maintainer judges per-target whether compression pays its readability cost. Suggested workflow:
103
+
104
+ 1. `wc -c <path>` before — record baseline char count.
105
+ 2. `python3 scripts/compress_memory.py <path>` — compress + back up.
106
+ 3. `wc -c <path>` after — record post-compression char count.
107
+ 4. Eyeball diff: does prose stay legible? Are all Iron-Law fences intact?
108
+ 5. If yes → commit both `<path>` and `<path>.original.md`. If no → `--decompress`.
109
+
110
+ Future `caveman-v2.md` will tabulate realised input-token saving against `agents-md-thin-root` 40 % pointer-ratio constraint so maintainer has numerical floor.
111
+
112
+ ## Cross-references
113
+
114
+ - [`caveman-speak`](../../rules/caveman-speak.md) — runtime rule script mirrors for input-side targets; `caveman.speak_scope` does **not** gate this script (input-side runs regardless).
115
+ - [`scripts/validate_safe_paths.py`](../../../scripts/validate_safe_paths.py) — Phase 0 gate; ported from upstream Caveman `63a91ec`.
116
+ - [`scripts/compress_memory.py`](../../../scripts/compress_memory.py) — implementation.
117
+ - [`tests/test_compress_memory.py`](../../../tests/test_compress_memory.py) — regression locks for each carve-out + idempotency + denylist.
118
+ - [`docs/contracts/compression-default-kill-criterion.md`](../../../docs/contracts/compression-default-kill-criterion.md) — v1 verdict (output-side; informs but does not gate this skill).
119
+ - [`agents-md-thin-root`](../agents-md-thin-root/SKILL.md) — caps consumer-shipped `templates/AGENTS.md`; this skill is one tool to land under cap.
@@ -39,7 +39,7 @@ schema_version: 1
39
39
  # CI guard: a release bump of `package.json` must update this value
40
40
  # in lockstep — see scripts/check_template_pin_drift.py (road-to-
41
41
  # portable-runtime-and-update-check P3.3).
42
- agent_config_version: "2.20.0"
42
+ agent_config_version: "2.20.1"
43
43
 
44
44
  # --- Project identity ---
45
45
  project:
@@ -6,7 +6,7 @@
6
6
  },
7
7
  "metadata": {
8
8
  "description": "Shared agent configuration \u2014 skills for AI coding tools (Claude Code, Augment, Cursor, Cline, Windsurf, Gemini CLI).",
9
- "version": "2.20.1",
9
+ "version": "2.21.0",
10
10
  "keywords": [
11
11
  "agent-config",
12
12
  "skills",
@@ -99,6 +99,7 @@
99
99
  "./.claude/skills/competitive-positioning",
100
100
  "./.claude/skills/composer-packages",
101
101
  "./.claude/skills/compress",
102
+ "./.claude/skills/compress-memory",
102
103
  "./.claude/skills/content-funnel-design",
103
104
  "./.claude/skills/context",
104
105
  "./.claude/skills/context-authoring",
package/CHANGELOG.md CHANGED
@@ -702,6 +702,41 @@ our recommendation order, not its support status.
702
702
  > that forces a new era split (`# Era: 2.18.x`, etc.) — see
703
703
  > [`docs/contracts/CHANGELOG-conventions.md § Era splits`](docs/contracts/CHANGELOG-conventions.md).
704
704
 
705
+ ## [2.21.0](https://github.com/event4u-app/agent-config/compare/2.20.1...2.21.0) (2026-05-17)
706
+
707
+ ### Features
708
+
709
+ * **telemetry:** caveman stats + per-conversation cost lens ([13300cc](https://github.com/event4u-app/agent-config/commit/13300cc2d709ec2cce58520621cf560fbd6414c3))
710
+ * **memory:** input-side compression for always-loaded files ([abfd5b1](https://github.com/event4u-app/agent-config/commit/abfd5b120f2effd2abd68adea45c8b15f315dfec))
711
+ * **bench:** add caveman v1 benchmark with terse-control arm ([1e37062](https://github.com/event4u-app/agent-config/commit/1e37062cada6f9be5bfa0dfe4083753ade87f2f2))
712
+ * **security:** add safe-paths denylist and caveman carve-outs validators ([249114d](https://github.com/event4u-app/agent-config/commit/249114d900a9d6960aee7bbeda5c28f85be718ad))
713
+
714
+ ### Bug Fixes
715
+
716
+ * **caveman-speak:** bullet-format prose lines to satisfy structural-density lock ([5c8006d](https://github.com/event4u-app/agent-config/commit/5c8006d8bd70fea93671361160e6b7c4399302c6))
717
+ * **refs:** inline roadmap council citations + mark contract council-refs as ADR trace ([5a11951](https://github.com/event4u-app/agent-config/commit/5a11951ebff87ba95465c0a6b9b59fd9a4d4cee2))
718
+ * **contracts:** drop roadmap reference from compression-default-kill-criterion ([f2b2124](https://github.com/event4u-app/agent-config/commit/f2b212495744dae1904b256aa11d47e230e7b534))
719
+ * **contracts:** add stability frontmatter to caveman-telemetry + cost-summary-schema ([c7efa54](https://github.com/event4u-app/agent-config/commit/c7efa54cc3587f42f3a21ca18b783b5504c56e04))
720
+ * **portability:** apply task-invocation fix in .agent-src/ projection ([be87c2b](https://github.com/event4u-app/agent-config/commit/be87c2be1637570a61b7a8863216288f3828609c))
721
+ * **portability:** swap task invocations for script paths in compress-memory skill ([886f9f4](https://github.com/event4u-app/agent-config/commit/886f9f4a615fa2351b9261a62fd49173f8e87c2f))
722
+ * **roadmap:** clarify agent-status is a command not a skill in step-16 ([96df39d](https://github.com/event4u-app/agent-config/commit/96df39de7a7562df946c37fea8bc42927851071f))
723
+ * **template:** bump agent_config_version pin to 2.20.1 ([c275864](https://github.com/event4u-app/agent-config/commit/c2758641718077baaaaefea1191760d397f6b47e))
724
+
725
+ ### Documentation
726
+
727
+ * **readme:** compact banner and badge row to stay under 750-line lint budget ([2f411a7](https://github.com/event4u-app/agent-config/commit/2f411a78993260d3bd1f5fb819779cad2b19ed07))
728
+ * **readme:** add hero banner and migrate count display to shields.io badges ([980fe1a](https://github.com/event4u-app/agent-config/commit/980fe1ac1c529e3c57c967db148f2b162240ff27))
729
+ * **caveman:** v1 kill-criterion verdict + Suspended state ([ca1751e](https://github.com/event4u-app/agent-config/commit/ca1751e8957783fa4e40ddfb89172702619f12bc))
730
+
731
+ ### Chores
732
+
733
+ * **ownership:** regenerate ownership matrix ([1331bce](https://github.com/event4u-app/agent-config/commit/1331bce63d223adb971a931babe5e163b5a8aa12))
734
+ * **index:** regenerate agents/index.md + docs/catalog.md for compress-memory ([6d16e7f](https://github.com/event4u-app/agent-config/commit/6d16e7fb3ca0a2fbe76a18337a94e98607262d06))
735
+ * **sync:** bump skill count 210 -> 211 (compress-memory) ([bb361ed](https://github.com/event4u-app/agent-config/commit/bb361edb4e5e65cbc4e7eb41382f88f1909e5583))
736
+ * **roadmaps:** close step-16 caveman-substance + archive-phantom-scan ([9388f9b](https://github.com/event4u-app/agent-config/commit/9388f9be662da17b98eb4000f16ff8fcf376e626))
737
+
738
+ Tests: 4559 (+24 since 2.20.1)
739
+
705
740
  ## [2.20.1](https://github.com/event4u-app/agent-config/compare/2.20.0...2.20.1) (2026-05-16)
706
741
 
707
742
  ### Bug Fixes
package/README.md CHANGED
@@ -1,5 +1,9 @@
1
+ <p align="center"><a href="https://event4u.app"><img alt="event4u Agent Config" src=".github/assets/banner.png"></a></p>
2
+
1
3
  # Agent Config — Universal AI Agent OS
2
4
 
5
+ [![Skills](https://img.shields.io/badge/Skills-211-1f6feb?style=flat-square)](.augment/skills/) [![Rules](https://img.shields.io/badge/Rules-79-d73a49?style=flat-square)](.augment/rules/) [![Commands](https://img.shields.io/badge/Commands-124-2da44e?style=flat-square)](.augment/commands/) [![Guidelines](https://img.shields.io/badge/Guidelines-72-8957e5?style=flat-square)](docs/guidelines/) [![Personas](https://img.shields.io/badge/Personas-22-bf8700?style=flat-square)](docs/personas.md) [![Advisors](https://img.shields.io/badge/Advisors-5-fb8500?style=flat-square)](docs/profiles.md) [![AI Tools](https://img.shields.io/badge/AI%20Tools-8-1abc9c?style=flat-square)](docs/architecture.md) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square)](LICENSE)
6
+
3
7
  > **A deterministic orchestration contract for AI agents — audited skills, governance rules, replayable state — usable by developers, founders, and creators alike.**
4
8
 
5
9
  Give your AI agents an audit-disciplined execution layer: **210 skills**, **79 governance rules**, **124 commands**, and a replayable state machine that turns any host agent (Claude Code, Augment, Cursor, Copilot, Windsurf) into a reliable team member.
@@ -27,10 +31,6 @@ schema: [`docs/contracts/profile-system.md`](docs/contracts/profile-system.md).
27
31
  Beyond software: [`user-types/`](.agent-src.uncompressed/user-types/)
28
32
  (galabau · metalworking · truck — see [Who this is for](#who-this-is-for)).
29
33
 
30
- <p align="center">
31
- <strong>210 Skills</strong> · <strong>79 Rules</strong> · <strong>124 Commands</strong> · <strong>72 Guidelines</strong> · <strong>22 Personas</strong> · <strong>5 Advisors</strong> · <strong>8 AI Tools</strong>
32
- </p>
33
-
34
34
  <p align="center">
35
35
  <a href="CHANGELOG.md">CHANGELOG</a> ·
36
36
  <a href="https://github.com/event4u-app/agent-config/releases/latest">Latest release</a> ·
@@ -577,7 +577,7 @@ slash-commands) &nbsp; 📌 = informational marker only (no auto-discovery
577
577
  or manual wiring required)
578
578
 
579
579
  > **What this means in practice:** Claude Code gets the full project-scoped
580
- > package (rules + 210 skills + 124 native commands); Augment Code gets the
580
+ > package (rules + 211 skills + 124 native commands); Augment Code gets the
581
581
  > same content but only from a single global install at `~/.augment/`.
582
582
  > Cursor, Cline, Windsurf, Gemini CLI, GitHub Copilot, Roo Code, Codex CLI,
583
583
  > and Continue.dev only get the **rules** natively; skills and commands are
@@ -141,7 +141,7 @@ note, package-internal path-swap, description budget, and the
141
141
 
142
142
  | Layer | Count | Purpose |
143
143
  |---|---|---|
144
- | **Skills** | 210 | On-demand expertise — stack analysis (Laravel · Symfony · Zend / Laminas · Next.js · React · Node), testing, Docker, API design, security, observability, … |
144
+ | **Skills** | 211 | On-demand expertise — stack analysis (Laravel · Symfony · Zend / Laminas · Next.js · React · Node), testing, Docker, API design, security, observability, … |
145
145
  | **Rules** | 79 | Always-active constraints — coding standards, scope control, verification, language-and-tone, agent-authority |
146
146
  | **Commands** | 124 | Slash-command workflows — `/commit`, `/create-pr`, `/fix ci`, `/optimize skills`, `/feature plan`, `/work`, `/implement-ticket`, `/compress`, … |
147
147
  | **Guidelines** | 72 | Reference material cited by skills — PHP patterns, Eloquent, Playwright, agent-infra, … |
@@ -0,0 +1,74 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-14
4
+ ---
5
+
6
+ # Benchmark cadence
7
+
8
+ > **Status:** active · **Owner:** `step-16-caveman-substance.md` Phase 1 ·
9
+ > **Sources:** [`benchmark-corpus-spec.md`](contracts/benchmark-corpus-spec.md) ·
10
+ > [`benchmark-report-schema.md`](contracts/benchmark-report-schema.md)
11
+
12
+ Where the package's benchmark runs live, when they run, and what counts as
13
+ a publishable report. Mirrors the Ruflo `docs/benchmarks/runs/<ISO>.json`
14
+ discipline (upstream `5b71c7a`).
15
+
16
+ ## Corpora
17
+
18
+ | Corpus | Path | Purpose |
19
+ |---|---|---|
20
+ | `dev` | `tests/eval/corpus-dev.yaml` | router / engine selection |
21
+ | `caveman` | `bench/corpora/caveman/prompts.yaml` | compression dialect (`vs_raw` + `vs_terse`) |
22
+
23
+ ## Reports — naming and trail
24
+
25
+ - **Canonical pointer:** `bench/reports/<corpus>-v<N>.{json,md}` — always
26
+ reflects the latest published run for that corpus version.
27
+ - **Timestamped trail:** `bench/reports/<ISO-Zulu>-<corpus>-v<N>.{json,md}`
28
+ — every committed run keeps an immutable history copy alongside.
29
+
30
+ Both are produced in one `scripts/bench_run.py` invocation; do not commit
31
+ one without the other.
32
+
33
+ ## Cadence
34
+
35
+ | Trigger | Required corpus | Required artefact |
36
+ |---|---|---|
37
+ | Pre-release bake (any `vX.Y.0`) | `dev` + `caveman` | both reports refreshed |
38
+ | Edit to `.agent-src.uncompressed/rules/caveman-speak.md` | `caveman` | report refreshed in same PR |
39
+ | Edit to `scripts/bench_run.py` `--caveman` arm | `caveman` | report refreshed in same PR |
40
+ | Edit to `bench/corpora/caveman/prompts.yaml` | `caveman` | report refreshed, version bumped (`caveman-vN+1`) |
41
+ | Edit to `scripts/_lib/bench_caveman*.py` | `caveman` | report refreshed in same PR |
42
+
43
+ A PR that touches any of the cadence triggers without refreshing the
44
+ corresponding report is rejected by reviewer convention (no CI gate yet
45
+ — the trigger surface is too small to warrant one).
46
+
47
+ ## Cost envelope (`caveman` corpus)
48
+
49
+ 10 prompts × 3 arms (`compressed` · `terse-control` · `uncompressed`) = 30
50
+ Anthropic calls per run. Observed envelope on `claude-sonnet-4-5` (v1,
51
+ 2026-05-16): **$0.0805 actual** · 0 errors · realised carve-out share
52
+ 30.67 %.
53
+
54
+ ## Commands
55
+
56
+ ```bash
57
+ task bench -- --caveman # full run
58
+ task bench -- --caveman --caveman-max-prompts 1 # 1-prompt smoke
59
+ task bench -- --caveman --caveman-dry-run --no-write # offline shape
60
+ ```
61
+
62
+ Cost-touched runs require an `ANTHROPIC_API_KEY` at
63
+ `~/.event4u/agent-config/anthropic.key` (mode 600).
64
+
65
+ ## Cross-references
66
+
67
+ - [`benchmark-corpus-spec.md`](contracts/benchmark-corpus-spec.md) —
68
+ per-prompt schema.
69
+ - [`benchmark-report-schema.md`](contracts/benchmark-report-schema.md) —
70
+ per-report JSON / Markdown contract.
71
+ - [`compression-default-kill-criterion.md`](contracts/compression-default-kill-criterion.md)
72
+ — how a published `caveman-v<N>` report is read against the kill table.
73
+ - `agents/roadmaps/step-16-caveman-substance.md` Phase 1 — where the
74
+ caveman corpus was authored.
package/docs/catalog.md CHANGED
@@ -1,13 +1,13 @@
1
1
  # agent-config — Public Catalog
2
2
 
3
- Consumer-facing catalog of all **482 public artefacts** shipped by
3
+ Consumer-facing catalog of all **483 public artefacts** shipped by
4
4
  this package. Internal package-maintenance rules and deprecation shims
5
5
  are excluded.
6
6
 
7
7
  > **Regenerate:** `python3 scripts/generate_index.py`
8
8
  > Auto-generated — do not edit manually.
9
9
 
10
- ## Skills (210)
10
+ ## Skills (211)
11
11
 
12
12
  | kind | name | extra | description |
13
13
  |---|---|---|---|
@@ -43,6 +43,7 @@ are excluded.
43
43
  | skill | [`competitive-moat-analysis`](../.agent-src/skills/competitive-moat-analysis/SKILL.md) | | Use when mapping competitors, naming defensibility, and finding white-space — moat reasoning, where-to-play, where-not-to-play. Triggers on 'who are we competing with', 'what's our moat'. |
44
44
  | skill | [`competitive-positioning`](../.agent-src/skills/competitive-positioning/SKILL.md) | | Use when comparing this package to a peer / competitor — ours-vs-theirs verdict table, axis selection, adoption queue. Triggers on 'how do we compare to X', 'should we adopt their pattern'. |
45
45
  | skill | [`composer-packages`](../.agent-src/skills/composer-packages/SKILL.md) | | Use when building or maintaining a Composer library — versioning, Laravel integration, autoloading, publishing to private registries — even when the user says 'release a new version'. |
46
+ | skill | [`compress-memory`](../.agent-src/skills/compress-memory/SKILL.md) | | Use when shrinking always-loaded memory files (AGENTS.md, CLAUDE.md, .cursorrules) via caveman grammar — refuses sensitive paths, round-trips via .original.md backup. |
46
47
  | skill | [`content-funnel-design`](../.agent-src/skills/content-funnel-design/SKILL.md) | | Use when mapping funnel-stage to content shape — conversion-pathway, content-as-system, leverage-point selection. Triggers on 'design our content funnel', 'why does mid-funnel leak'. |
47
48
  | skill | [`context-authoring`](../.agent-src/skills/context-authoring/SKILL.md) | | Use when filling in knowledge-layer context files — auth-model, tenant-boundaries, data-sensitivity, deployment-order, observability — interactive walkthrough that turns templates into reviewer fuel. |
48
49
  | skill | [`context-document`](../.agent-src/skills/context-document/SKILL.md) | | Use when the user says "create context", "document this area", or wants a structured snapshot of a codebase area for agent orientation. |
@@ -0,0 +1,83 @@
1
+ ---
2
+ stability: beta
3
+ keep-beta-until: 2026-08-15
4
+ ---
5
+
6
+ # caveman telemetry — multiplier contract
7
+
8
+ > **Status:** suspended (kill-criterion not met in `caveman-v1`).
9
+ > Telemetry surface records `caveman_delta_tokens = 0` until a v2 bench
10
+ > proves a positive multiplier on the load-bearing `vs_terse` arm.
11
+
12
+ ## Constant
13
+
14
+ | Key | Value | Provenance |
15
+ |---|---|---|
16
+ | `caveman_multiplier_version` | `v1` | Tied to `bench/reports/caveman-v1.{json,md}` |
17
+ | `caveman_multiplier_value` | `0.9155` | `median(terse_control_tokens / compressed_tokens)` over the 10-prompt v1 corpus |
18
+ | `caveman_multiplier_p10` | `0.4506` | 10th percentile (worst-case carve-out-tax prompts) |
19
+ | `caveman_multiplier_p90` | `2.3664` | 90th percentile (pure-prose prompts where caveman wins) |
20
+ | `caveman_multiplier_active` | `false` | **Suspended** — kill-criterion not met (`vs_terse` median −9.27 %) |
21
+
22
+ The **active** flag gates whether the multiplier is applied to runtime
23
+ telemetry. While `false`, `scripts/caveman_stats.py` reports
24
+ `caveman_delta_tokens = 0` regardless of `speak_scope` setting.
25
+
26
+ ## How the multiplier is interpreted
27
+
28
+ `caveman_estimated_uncompressed_tokens = caveman_compressed_tokens × M`,
29
+ where `M = caveman_multiplier_value`.
30
+
31
+ `caveman_delta_tokens = caveman_estimated_uncompressed_tokens − caveman_compressed_tokens`.
32
+
33
+ - `M > 1.0` → caveman compresses; `delta` is **positive** (saving).
34
+ - `M = 1.0` → break-even; no delta surfaced.
35
+ - `M < 1.0` → caveman costs more than the terse baseline; `delta` is
36
+ **negative**. Surfacing a negative saving is misleading for the
37
+ user (looks like a bug), so the contract is to **suspend the
38
+ multiplier** and record `delta = 0` until a v2 bench lifts `M`
39
+ above `1.0` on the load-bearing arm.
40
+
41
+ ## Why suspended after v1
42
+
43
+ The `caveman-v1` bench (`bench/reports/caveman-v1.md`, 30 calls,
44
+ 2026-05-16) found:
45
+
46
+ - Median savings vs raw uncompressed: **+23.51 %** (inflated by the
47
+ carve-out-tax-free pure-prose prompts).
48
+ - Median savings vs terse-control: **−9.27 %** (load-bearing).
49
+ - Carve-out-heavy prompts (path-list −108 %, mode-marker −123 %)
50
+ drag the median negative.
51
+
52
+ The terse-control arm is the kill-criterion baseline per
53
+ [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md).
54
+ Until a v2 bench (broader corpus or a re-tuned dialect) lifts the
55
+ `vs_terse` median to ≥ 0 %, the multiplier stays suspended.
56
+
57
+ ## How to lift the suspension
58
+
59
+ 1. Run an extended bench against a broader corpus (Phase 3+ work).
60
+ 2. If `median(savings_vs_terse) ≥ 0` (and ideally ≥ 30 % to flip the
61
+ rule default), recompute `caveman_multiplier_value`.
62
+ 3. Update this contract: bump `caveman_multiplier_version` to `v2`,
63
+ set `caveman_multiplier_active = true`, cite the new bench file.
64
+ 4. The change is reversible — drop back to `v1` if a regression
65
+ appears.
66
+
67
+ ## Consumers
68
+
69
+ - [`scripts/caveman_stats.py`](../../scripts/caveman_stats.py) — reads
70
+ this constant, computes per-session / per-conversation / lifetime
71
+ deltas from `agents/cost-tracking/sessions.jsonl`.
72
+ - [`scripts/cost_summary.py`](../../scripts/cost_summary.py) — emits
73
+ the stable JSON contract for inter-tool consumption per
74
+ [`cost-summary-schema.md`](cost-summary-schema.md).
75
+ - `agent-status` skill — surfaces the per-session delta in the
76
+ status report under the `[caveman: …]` widget.
77
+
78
+ ## See also
79
+
80
+ - [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) — the rule-default-flip gate; this multiplier is gated on the same `vs_terse` arm.
81
+ - [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
82
+ - [`bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
83
+ - [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md) — runtime rule the multiplier measures.
@@ -5,14 +5,16 @@ keep-beta-until: 2026-08-14
5
5
 
6
6
  # Compression default — kill-criterion
7
7
 
8
- > **Status:** parked, criterion-deferred · **Owner:** `step-4-measurement-and-benchmark.md`
9
- > closeout phase · **Source:** [`council-synthesis.md` § 7](../../agents/audit-2026-05-14-north-star/council-synthesis.md)
8
+ > **Status:** v1-measured · criterion not met · default stays `off` · **Owner:** `step-16-caveman-substance.md`
9
+ > Phase 1 closeout · **Sources:** [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
10
+ > [`council-synthesis.md` § 7](../../agents/audit-2026-05-14-north-star/council-synthesis.md) ·
11
+ > [`caveman-v1-kc-verdict.json`](../../agents/council-responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
10
12
 
11
13
  ## Rule
12
14
 
13
15
  ```
14
- DEFAULT STAYS OFF UNTIL `task bench` PRODUCES A NUMBER.
15
- DECISION OWNED BY step-4 CLOSEOUT, NOT BY THIS DOC OR BY step-99.
16
+ DEFAULT STAYS OFF UNTIL `task bench -- --caveman` PRODUCES A POSITIVE vs_terse MEDIAN.
17
+ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
16
18
  ```
17
19
 
18
20
  1. **Current state.** `caveman.speak_scope` defaults `off`. Carve-outs
@@ -21,49 +23,94 @@ DECISION OWNED BY step-4 CLOSEOUT, NOT BY THIS DOC OR BY step-99.
21
23
  [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
22
24
  but the feature is non-promoted: no skill recommends turning it on,
23
25
  no preset enables it, no profile depends on it.
24
- 2. **Baseline window.** 60 days from the first green run of
25
- `task bench` against the locked 25-prompt corpus
26
- (`step-4-measurement-and-benchmark.md`
27
- Phase 2). The corpus, the model, and the cost-tracker are frozen
28
- for the window; mid-window changes restart the clock.
29
- 3. **Decision points.** After the window closes, `step-4` closeout
30
- reads `docs/parity/bench.json` and applies exactly one of:
31
-
32
- | Measured tokens saved | Quality regression on corpus | Verdict |
26
+ 2. **Baselines.** Every published `bench/reports/caveman-v<N>.{json,md}`
27
+ measures three arms (`compressed` · `terse-control` ·
28
+ `uncompressed`) and reports two savings columns:
29
+ - `vs_raw` median savings against the uncompressed arm.
30
+ - `vs_terse` **load-bearing** median savings against the
31
+ `Answer concisely.` terse-control arm. `vs_raw` is inflated by the
32
+ carve-out-tax-free pure-prose case and is **not** the gate metric.
33
+ 3. **Decision table.** Read the latest `bench/reports/caveman-v<N>.md`
34
+ and apply exactly one of:
35
+
36
+ | Measured `vs_terse` median | Quality regression on corpus | Verdict |
33
37
  |---|---|---|
34
- | < 30 % | any | **Deprecate**remove `caveman-speak` rule, archive `caveman-compress` script, retire `caveman.*` settings keys with a one-release deprecation window |
35
- | 30 % | < 5 % | **Flip default on** `caveman.speak_scope` defaults to a non-`off` value, carve-outs stay, statusline surfaces lifetime tokens saved |
36
- | ≥ 30 % | 5 % | **Hold** — repeat the window once with tuned intensity ladder; second hold deprecate |
38
+ | < 0 % | any | **Criterion not met defer.** Keep default `off`. No telemetry multiplier. Next move owned by the corpus-widening / methodology-revision step that produces `caveman-v<N+1>`. |
39
+ | 0 % < 30 % | any | **Hold.** Keep default `off`. Authorised follow-up: widen corpus or tune carve-out share; no default flip. |
40
+ | ≥ 30 % | < 5 % | **Flip default on** — `caveman.speak_scope` defaults to a non-`off` value (separate roadmap), carve-outs stay, statusline surfaces lifetime tokens saved. |
41
+ | ≥ 30 % | ≥ 5 % | **Hold** — repeat the window once with tuned intensity ladder; second hold → deprecate. |
37
42
 
38
43
  "Quality regression" = host-side rubric on the corpus per
39
- `step-4-measurement-and-benchmark.md` Phase 3. Numbers checked into
40
- `docs/parity/bench.json` as the decision artefact.
44
+ `benchmark-report-schema.md`. Numbers checked into the published
45
+ `caveman-v<N>.json` as the decision artefact.
41
46
  4. **No interim flip.** The default does not move on anecdote,
42
- gut feeling, or a single benchmark snapshot. The 60-day window and
43
- the table above are the only path to a default change.
47
+ gut feeling, or a single positive prompt. Only a published
48
+ `caveman-v<N>` report with a `vs_terse` median in the "Flip" row
49
+ above authorises a default change, under a follow-up roadmap.
50
+
51
+ ## v1 verdict (2026-05-16)
52
+
53
+ [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
54
+ landed 30 calls · $0.0805 · 0 errors · `claude-sonnet-4-5`:
55
+
56
+ | Metric | Median | p10 | p90 |
57
+ |---|---:|---:|---:|
58
+ | `vs_raw` savings | +23.51 % | -18.29 % | +52.53 % |
59
+ | **`vs_terse` savings** | **−9.27 %** | **−109.85 %** | +51.32 % |
60
+ | Realised carve-out share (compressed arm) | 30.67 % | — | — |
61
+
62
+ Per row 1 of the table, the v1 verdict is **criterion not met — defer**.
63
+ Default stays `off`; no telemetry multiplier ships; no rule retirement
64
+ in this roadmap. Wins exist only on pure-prose prompts (caveman-09
65
+ +50.5 %, caveman-10 +58.4 %); carve-out-heavy prompts drag the median
66
+ negative (caveman-04 path-list −108 %, caveman-06 mode-marker −123 %).
67
+
68
+ ### Council split (recorded, not decisive)
69
+
70
+ Council run [`caveman-v1-kc-verdict.json`](../../agents/council-responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
71
+ (2 members · 1 round · $0.0514 actual) split:
72
+
73
+ - **`claude-sonnet-4-5`** → Decision A.1 (deprecate now) + Decision B.3
74
+ (suspend telemetry). Reasoning: the roadmap pinned `vs_terse` as
75
+ load-bearing; the data falsified it; retreating to `vs_raw` is
76
+ post-hoc rationalisation.
77
+ - **`gpt-4o`** → Decision A.3 (hold + re-bench with widened corpus +
78
+ revised terse-control prompt) + Decision B.2 (per-category
79
+ multipliers, suppress negatives). Reasoning: 10 prompts is a
80
+ razor-thin sample; the terse-control prompt may under-compress; the
81
+ carve-out validator (Phase 4) is not yet shipped, so we are
82
+ measuring a half-implemented feature.
83
+
84
+ **Synthesis (criterion-not-met + defer).** Both members agreed `vs_terse`
85
+ is the right gate. Neither's strongest path is taken in full inside
86
+ step-16: deprecation is reserved for a follow-up roadmap once v2 confirms
87
+ v1; re-bench is reserved for a follow-up roadmap with the methodology
88
+ revision the council requested. Step-16 ships the infrastructure (corpus,
89
+ bench arm, validator), records the v1 verdict, suspends the telemetry
90
+ multiplier, and hands the deprecate-vs-rebench call to the v2 roadmap.
44
91
 
45
92
  ## Why this is parked, not decided
46
93
 
47
- The council split (Opus = remove now, o1 = measure-then-decide) is
48
- real. Either branch is wrong-shaped without numbers. The kill-criterion
49
- gives the audit a deterministic resolution path and stops every
50
- downstream roadmap from re-litigating compression on every PR.
94
+ The 2026-05-14 council split (Opus = remove now, o1 = measure-then-decide)
95
+ predated v1 numbers. The 2026-05-16 council split (Sonnet = deprecate now,
96
+ GPT-4o = re-bench) is informed by v1 but disagrees on which methodological
97
+ weakness is decisive. The kill table above gives every future bench run a
98
+ deterministic resolution path and stops every downstream roadmap from
99
+ re-litigating compression on every PR.
51
100
 
52
101
  ## Cross-references
53
102
 
54
- - ``step-99-north-star-restructure.md` § Phase 4`
55
- parks this criterion, does not decide.
56
- - `step-4-measurement-and-benchmark.md`
57
- owns `task bench`, the corpus, and the closeout that applies the
58
- table above.
59
- - `step-10-caveman-parity.md`
60
- — implements the carve-outs and the statusline integration the
61
- "flip default on" branch depends on; blocks the default flip until
62
- acceptance is green.
103
+ - [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
104
+ v1 measurement; canonical baseline this doc cites.
105
+ - [`docs/benchmarks.md`](../benchmarks.md)
106
+ cadence + when the next bench run is mandatory.
107
+ - [`caveman-telemetry`](caveman-telemetry.md)
108
+ multiplier contract; records the suspended state v2 must lift.
63
109
  - [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
64
110
  — runtime rule; reads `caveman.speak_scope` from settings.
65
111
 
66
112
  ## Done
67
113
 
68
- This doc exists to keep the decision visible. It is **not** an action
69
- item. `step-4` closeout closes the loop.
114
+ This doc reflects the v1 verdict. It is **not** an action item. The next
115
+ bench closeout (against `caveman-v2` once a widened corpus or revised
116
+ methodology is shipped) closes the loop.