@event4u/agent-config 3.2.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. package/.agent-src/commands/agent-status.md +1 -1
  2. package/.agent-src/skills/compress-memory/SKILL.md +1 -1
  3. package/.claude-plugin/marketplace.json +1 -1
  4. package/AGENTS.md +5 -4
  5. package/CHANGELOG.md +24 -0
  6. package/dist/discovery/deprecation-report.md +1 -1
  7. package/dist/discovery/discovery-manifest.json +4 -4
  8. package/dist/discovery/discovery-manifest.json.sha256 +1 -1
  9. package/dist/discovery/discovery-manifest.summary.md +1 -1
  10. package/dist/discovery/orphan-report.md +1 -1
  11. package/dist/discovery/packs.json +2 -2
  12. package/dist/discovery/trust-report.md +1 -1
  13. package/dist/discovery/workspaces.json +2 -2
  14. package/dist/mcp/registry-manifest.json +1 -1
  15. package/docs/benchmarks.md +4 -4
  16. package/docs/contracts/CHANGELOG-conventions.md +1 -1
  17. package/docs/contracts/adr-mcp-runtime.md +1 -1
  18. package/docs/contracts/benchmark-corpus-spec.md +3 -3
  19. package/docs/contracts/benchmark-report-schema.md +5 -5
  20. package/docs/contracts/caveman-telemetry.md +4 -4
  21. package/docs/contracts/compression-default-kill-criterion.md +5 -5
  22. package/docs/contracts/cost-enforcement.md +1 -1
  23. package/docs/contracts/mcp-beta-criteria.md +1 -1
  24. package/docs/contracts/mcp-cloud-scope.md +4 -4
  25. package/docs/contracts/mcp-registry-manifest.schema.json +1 -1
  26. package/docs/contracts/mcp-tool-inventory.md +1 -1
  27. package/docs/contracts/mcp-tool-stub-envelope.md +1 -1
  28. package/docs/contracts/measurement-baseline.md +6 -6
  29. package/docs/decisions/ADR-027-changelog-machine-vs-manual.md +129 -0
  30. package/docs/decisions/ADR-028-root-layout.md +147 -0
  31. package/docs/decisions/ADR-029-multi-workspace-deferred.md +122 -0
  32. package/docs/decisions/INDEX.md +8 -0
  33. package/docs/mcp-server.md +1 -1
  34. package/docs/parity/bench-ruflo.json +3 -3
  35. package/docs/parity/ruflo.md +1 -1
  36. package/docs/setup/mcp-client-config.md +1 -1
  37. package/docs/setup/mcp-cloud-endpoints.md +1 -1
  38. package/docs/setup/mcp-cloud-setup.md +2 -2
  39. package/docs/setup/mcp-r2-bootstrap.md +1 -1
  40. package/package.json +1 -1
  41. package/scripts/__pycache__/validate_frontmatter.cpython-312.pyc +0 -0
  42. package/scripts/_lib/__pycache__/__init__.cpython-312.pyc +0 -0
  43. package/scripts/_lib/__pycache__/agent_src.cpython-312.pyc +0 -0
  44. package/scripts/_lib/bench_caveman.py +2 -2
  45. package/scripts/_lib/bench_caveman_report.py +1 -1
  46. package/scripts/_lib/bench_cost.py +2 -2
  47. package/scripts/_lib/bench_report.py +2 -2
  48. package/scripts/audit_mcp_tools.py +1 -1
  49. package/scripts/bench_baseline_ready.py +3 -3
  50. package/scripts/bench_compress_memory.py +4 -4
  51. package/scripts/bench_drift_check.py +2 -2
  52. package/scripts/bench_per_tool.py +2 -2
  53. package/scripts/bench_run.py +4 -4
  54. package/scripts/build_mcp_registry_manifest.py +2 -2
  55. package/scripts/mcp_server/__init__.py +1 -1
  56. package/scripts/mcp_server/catalog.py +1 -1
  57. package/scripts/mcp_server/consumer_tool_catalog.json +1 -1
  58. package/scripts/mcp_server/tools.py +1 -1
  59. package/scripts/pack_mcp_content.py +6 -6
  60. package/scripts/skill_trigger_eval.py +2 -2
@@ -66,7 +66,7 @@ Extract from latest record:
66
66
  - `by_model[]` — per-tier (haiku / sonnet / opus) input / output / cache split
67
67
  - `budget.tier` — `under` / `50` / `75` / `90` / `100` (from `node scripts/cost/budget.mjs check`)
68
68
 
69
- Pricing source: [`bench/pricing.yaml`](../../bench/pricing.yaml). Reader
69
+ Pricing source: [`internal/bench/pricing.yaml`](../../bench/pricing.yaml). Reader
70
70
  implementation: [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs).
71
71
 
72
72
  ### 3b. Read caveman delta + per-conversation cost lens
@@ -23,7 +23,7 @@ install:
23
23
 
24
24
  # compress-memory
25
25
 
26
- > **Experimental.** Output-side caveman dialect did not meet kill-criterion in [`bench/reports/caveman-v1.md`](../../../bench/reports/caveman-v1.md) (`vs_terse` median −9.27 %). Input-side memory compression is orthogonal use case: savings target always-loaded memory budget, not reply stream. Treat ship-criterion as **per-target measurement**, not v1 verdict.
26
+ > **Experimental.** Output-side caveman dialect did not meet kill-criterion in [`internal/bench/reports/caveman-v1.md`](../../../bench/reports/caveman-v1.md) (`vs_terse` median −9.27 %). Input-side memory compression is orthogonal use case: savings target always-loaded memory budget, not reply stream. Treat ship-criterion as **per-target measurement**, not v1 verdict.
27
27
 
28
28
  ## When to use
29
29
 
@@ -6,7 +6,7 @@
6
6
  },
7
7
  "metadata": {
8
8
  "description": "Shared agent configuration \u2014 skills for AI coding tools (Claude Code, Augment, Cursor, Cline, Windsurf, Gemini CLI).",
9
- "version": "3.2.0",
9
+ "version": "3.3.0",
10
10
  "keywords": [
11
11
  "agent-config",
12
12
  "skills",
package/AGENTS.md CHANGED
@@ -4,7 +4,7 @@
4
4
 
5
5
  ## Source of truth
6
6
 
7
- Edit `packages/<pack>/.agent-src.uncompressed/` only. Generated trees (`.agent-src/`, `.augment/`, `.claude/`, `.cursor/`, `.clinerules/`, `.windsurfrules`) regenerate from `task sync` + `task generate-tools`; never hand-edit.
7
+ Edit `packages/<pack>/.agent-src.uncompressed/` only. Generated trees regenerate from `task sync` + `task generate-tools`; never hand-edit.
8
8
 
9
9
  ## Working on this repo
10
10
 
@@ -20,11 +20,12 @@ task ci # full pipeline — green before PR
20
20
  - **Package self-orientation** — identity, cognition map, layout, stack, key rules, telemetry: [`package-self-orientation`](docs/contracts/package-self-orientation.md).
21
21
  - **Kernel + Router** — 9 Iron-Law rules, tier-1/2 routing, cost profiles, per-rule caps: [`kernel-membership`](docs/contracts/kernel-membership.md) + [`rule-router`](docs/contracts/rule-router.md).
22
22
  - **Trust & Safety** — trust-level enum, HRR banner, safety floors, installer confirm: [`trust-and-safety`](docs/contracts/trust-and-safety.md) + [`ADR-018`](docs/decisions/ADR-018-trust-and-safety-layer.md).
23
- - **Content pipelines** — A→D source / Augment / multi-tool / Claude.ai-bundle projections, indexed at [`docs/architecture.md`](docs/architecture.md); sub-pipelines under [`docs/architecture/`](docs/architecture/).
24
- - **Editing this repo** — Iron-Law rules + Thin-Root contract govern every change: [`augment-source-of-truth`](.agent-src/rules/augment-source-of-truth.md) + [`agents-md-thin-root`](.agent-src/skills/agents-md-thin-root/SKILL.md).
23
+ - **Content pipelines** — A→D source / Augment / multi-tool / Claude.ai-bundle projections indexed at [`docs/architecture.md`](docs/architecture.md).
24
+ - **Editing this repo** — Iron-Law rules + Thin-Root contract govern every edit: [`augment-source-of-truth`](.agent-src/rules/augment-source-of-truth.md) + [`agents-md-thin-root`](.agent-src/skills/agents-md-thin-root/SKILL.md).
25
25
  - **Consumer story** — `npx` + `scripts/install.sh` opt-in flags, sandbox / offline install paths, verified-offline manifest: [`README.md`](README.md).
26
26
  - **Personas** — 11 review-lens cast (6 core · 5 specialist), `personas:` vs `/mode`: [`docs/personas.md`](docs/personas.md).
27
- - **Discovery** — workspaces / packs / dist manifest; contract [`ADR-013`](docs/decisions/ADR-013-discovery-frontmatter-contract.md), how-to [`customization § Workspaces & packs`](docs/customization.md#workspaces--packs-discovery).
27
+ - **Discovery** — workspaces / packs / dist manifest: [`ADR-013`](docs/decisions/ADR-013-discovery-frontmatter-contract.md) + [`customization`](docs/customization.md#workspaces--packs-discovery).
28
+ - **Root layout** — maintainer-only dirs (`bench`, `evals`, `workers`) live under [`internal/`](internal/README.md) per [`ADR-028`](docs/decisions/ADR-028-root-layout.md).
28
29
 
29
30
  ## Emergency triage — read this when nothing else is reachable
30
31
 
package/CHANGELOG.md CHANGED
@@ -738,6 +738,30 @@ our recommendation order, not its support status.
738
738
  > that forces a new era split (`# Era: 3.3.x`, etc.) — see
739
739
  > [`docs/contracts/CHANGELOG-conventions.md § Era splits`](docs/contracts/CHANGELOG-conventions.md).
740
740
 
741
+ ## [3.3.0](https://github.com/event4u-app/agent-config/compare/3.2.0...3.3.0) (2026-05-25)
742
+
743
+ ### Features
744
+
745
+ * **scripts:** point bench / eval / mcp tooling at internal/ (ADR-028 phase 1) ([c20b515](https://github.com/event4u-app/agent-config/commit/c20b515dfc80d03244e54af7b50e07d562647799))
746
+
747
+ ### Bug Fixes
748
+
749
+ * **ci:** repoint bench/ -> internal/bench/ in projected agent-status + compress-memory ([682040b](https://github.com/event4u-app/agent-config/commit/682040b230248bd054767ab10bd1e90ed3a1d0b7))
750
+
751
+ ### Documentation
752
+
753
+ * **roadmap:** archive road-to-root-layout-cleanup at 100% complete ([948f1a8](https://github.com/event4u-app/agent-config/commit/948f1a8cd9ec63fa831d5abab25aeb3657a0a7d4))
754
+ * **adr:** land Phase 2 audit evidence + ADR-029 multi-workspace deferral ([c4dabf1](https://github.com/event4u-app/agent-config/commit/c4dabf1f718fee0d4709d727d1e3eb781f8db62f))
755
+ * **adr:** land ADR-028 + root-layout-cleanup roadmap ([7b0fd28](https://github.com/event4u-app/agent-config/commit/7b0fd281288a87f9c218d9fa9d73f01a48a56626))
756
+ * repoint bench / mcp-worker paths to internal/ (ADR-028 phase 1) ([85278dd](https://github.com/event4u-app/agent-config/commit/85278dd831d44a187dcfe7684e895bac0ca97ae4))
757
+
758
+ ### Chores
759
+
760
+ * **root:** move bench/ and workers/ under internal/ (ADR-028 phase 1) ([2a1dd1e](https://github.com/event4u-app/agent-config/commit/2a1dd1e363f9aabb59b6eebf35f64a47609abebb))
761
+ * **roadmap:** close changelog-era-auto-split (Phase 4 → ADR-027) ([b10b326](https://github.com/event4u-app/agent-config/commit/b10b32623b42739f0eba12a52173b810169929b6))
762
+
763
+ Tests: 4938 (+9 since 3.2.0)
764
+
741
765
  ## [3.2.0](https://github.com/event4u-app/agent-config/compare/3.1.1...3.2.0) (2026-05-25)
742
766
 
743
767
  ### Features
@@ -1,6 +1,6 @@
1
1
  # Discovery — Deprecation Report
2
2
 
3
- - Generated: `2026-05-25T09:38:22Z`
3
+ - Generated: `2026-05-25T12:26:32Z`
4
4
  - Deprecated artefacts: **0**
5
5
 
6
6
  _None. Tree is clean._
@@ -24,7 +24,7 @@
24
24
  },
25
25
  {
26
26
  "category": "command",
27
- "checksum": "sha256:c31aacf3434e5c983ba2b53a07febd6fa77acee43c53935d1a1c339b5a8daff1",
27
+ "checksum": "sha256:756535431a0d4d7049960e137c022cad35cb5221fc459fa98e146666a50649ce",
28
28
  "install": {
29
29
  "default": true,
30
30
  "removable": false
@@ -4906,7 +4906,7 @@
4906
4906
  },
4907
4907
  {
4908
4908
  "category": "skill",
4909
- "checksum": "sha256:24843ebe0c69e9c62080169b4ae6487b37c68250a13fd099695377aa0eaf6373",
4909
+ "checksum": "sha256:4ff26cf1af1b252755b72829247e39f85f574d7453133fc5d96c1aaf20f63d97",
4910
4910
  "install": {
4911
4911
  "default": true,
4912
4912
  "removable": false
@@ -9387,7 +9387,7 @@
9387
9387
  ]
9388
9388
  }
9389
9389
  ],
9390
- "checksum": "sha256:a47138f513cdd4d1d024f20d235b6f77f881b380254c13f71613d383a9eed62d",
9390
+ "checksum": "sha256:fd6102dc4515859fe55d3e85c8ad839c19bb5d8b1e579a5e8defcb137ecd3993",
9391
9391
  "documented_unassigned": [
9392
9392
  {
9393
9393
  "category": "template",
@@ -9500,7 +9500,7 @@
9500
9500
  "reason": "scaffold for new SKILL.md authoring"
9501
9501
  }
9502
9502
  ],
9503
- "generated_at": "2026-05-25T09:38:22Z",
9503
+ "generated_at": "2026-05-25T12:26:32Z",
9504
9504
  "packs": [
9505
9505
  {
9506
9506
  "artefact_count": 84,
@@ -1 +1 @@
1
- 076a70a63686c4548465a7244ec34f6b251867115e7f3f9e978579329f2b1dda discovery-manifest.json
1
+ 6c11926a0bf485b24906979f2dc1bdc569985a46008247c567726daef64cf009 discovery-manifest.json
@@ -1,6 +1,6 @@
1
1
  # Discovery Manifest — Summary
2
2
 
3
- - Generated: `2026-05-25T09:38:22Z`
3
+ - Generated: `2026-05-25T12:26:32Z`
4
4
  - Scanner: `80c8c0b8b827`
5
5
  - Artefacts: **430**
6
6
  - Unassigned: **0**
@@ -1,6 +1,6 @@
1
1
  # Discovery — Orphan Report
2
2
 
3
- - Generated: `2026-05-25T09:38:22Z`
3
+ - Generated: `2026-05-25T12:26:32Z`
4
4
  - Orphan artefacts: **0**
5
5
 
6
6
  > An orphan is an artefact whose declared pack has no other members.
@@ -1,6 +1,6 @@
1
1
  {
2
- "checksum": "sha256:a47138f513cdd4d1d024f20d235b6f77f881b380254c13f71613d383a9eed62d",
3
- "generated_at": "2026-05-25T09:38:22Z",
2
+ "checksum": "sha256:fd6102dc4515859fe55d3e85c8ad839c19bb5d8b1e579a5e8defcb137ecd3993",
3
+ "generated_at": "2026-05-25T12:26:32Z",
4
4
  "packs": [
5
5
  {
6
6
  "artefact_count": 84,
@@ -1,6 +1,6 @@
1
1
  # Discovery — Trust Report
2
2
 
3
- - Generated: `2026-05-25T09:38:22Z`
3
+ - Generated: `2026-05-25T12:26:32Z`
4
4
  - Workspaces tracked: **8**
5
5
  - Human-review-required artefacts: **2**
6
6
 
@@ -1,6 +1,6 @@
1
1
  {
2
- "checksum": "sha256:a47138f513cdd4d1d024f20d235b6f77f881b380254c13f71613d383a9eed62d",
3
- "generated_at": "2026-05-25T09:38:22Z",
2
+ "checksum": "sha256:fd6102dc4515859fe55d3e85c8ad839c19bb5d8b1e579a5e8defcb137ecd3993",
3
+ "generated_at": "2026-05-25T12:26:32Z",
4
4
  "scanner_version": "80c8c0b8b827",
5
5
  "workspaces": [
6
6
  {
@@ -9,7 +9,7 @@
9
9
  "homepage": "https://github.com/event4u-app/agent-config#readme",
10
10
  "name": "@event4u/agent-config",
11
11
  "repository": "https://github.com/event4u-app/agent-config",
12
- "version": "3.2.0"
12
+ "version": "3.3.0"
13
13
  },
14
14
  "registries": [
15
15
  {
@@ -18,13 +18,13 @@ discipline (upstream `5b71c7a`).
18
18
  | Corpus | Path | Purpose |
19
19
  |---|---|---|
20
20
  | `dev` | `tests/eval/corpus-dev.yaml` | router / engine selection |
21
- | `caveman` | `bench/corpora/caveman/prompts.yaml` | compression dialect (`vs_raw` + `vs_terse`) |
21
+ | `caveman` | `internal/bench/corpora/caveman/prompts.yaml` | compression dialect (`vs_raw` + `vs_terse`) |
22
22
 
23
23
  ## Reports — naming and trail
24
24
 
25
- - **Canonical pointer:** `bench/reports/<corpus>-v<N>.{json,md}` — always
25
+ - **Canonical pointer:** `internal/bench/reports/<corpus>-v<N>.{json,md}` — always
26
26
  reflects the latest published run for that corpus version.
27
- - **Timestamped trail:** `bench/reports/<ISO-Zulu>-<corpus>-v<N>.{json,md}`
27
+ - **Timestamped trail:** `internal/bench/reports/<ISO-Zulu>-<corpus>-v<N>.{json,md}`
28
28
  — every committed run keeps an immutable history copy alongside.
29
29
 
30
30
  Both are produced in one `scripts/bench_run.py` invocation; do not commit
@@ -37,7 +37,7 @@ one without the other.
37
37
  | Pre-release bake (any `vX.Y.0`) | `dev` + `caveman` | both reports refreshed |
38
38
  | Edit to `.agent-src.uncompressed/rules/caveman-speak.md` | `caveman` | report refreshed in same PR |
39
39
  | Edit to `scripts/bench_run.py` `--caveman` arm | `caveman` | report refreshed in same PR |
40
- | Edit to `bench/corpora/caveman/prompts.yaml` | `caveman` | report refreshed, version bumped (`caveman-vN+1`) |
40
+ | Edit to `internal/bench/corpora/caveman/prompts.yaml` | `caveman` | report refreshed, version bumped (`caveman-vN+1`) |
41
41
  | Edit to `scripts/_lib/bench_caveman*.py` | `caveman` | report refreshed in same PR |
42
42
 
43
43
  A PR that touches any of the cadence triggers without refreshing the
@@ -52,7 +52,7 @@ Tests: NNNN (+M since X.Y.(Z-1))
52
52
  ```
53
53
 
54
54
  The test-count line is enforced for any release that ships changes to
55
- `scripts/`, `workers/`, or `.agent-src/` content; it can be omitted for
55
+ `scripts/`, `internal/workers/`, or `.agent-src/` content; it can be omitted for
56
56
  pure-docs releases.
57
57
 
58
58
  ## What counts as breaking
@@ -17,7 +17,7 @@ per [`scripts/mcp_server/requirements.txt`](../../scripts/mcp_server/requirement
17
17
  **FastMCP** (the higher-level decorator wrapper) and the **MCP TypeScript SDK**
18
18
  are explicitly rejected for this surface.
19
19
 
20
- The hosted Cloudflare Worker bridge (`workers/mcp/`) is the only place a
20
+ The hosted Cloudflare Worker bridge (`internal/workers/mcp/`) is the only place a
21
21
  non-Python runtime is allowed, and it stays bound to the same wire contract
22
22
  (see [`mcp-cloud-scope.md`](mcp-cloud-scope.md)).
23
23
 
@@ -13,12 +13,12 @@ and validation invariants.
13
13
  ## Path decision
14
14
 
15
15
  Roadmap `step-4-measurement-and-benchmark.md`
16
- Phase 1 Step 2 names `bench/corpus.yaml`. The existing benchmark
16
+ Phase 1 Step 2 names `internal/bench/corpus.yaml`. The existing benchmark
17
17
  infrastructure (runner + non-dev corpus + `task bench`) lives under
18
18
  `tests/eval/` and `scripts/bench_runner.py` hardcodes that directory.
19
- **Canonical location:** `tests/eval/corpus-<id>.yaml`. The `bench/`
19
+ **Canonical location:** `tests/eval/corpus-<id>.yaml`. The `internal/bench/`
20
20
  directory is reserved for **reports + pricing** (Phase 2 deliverables).
21
- Migration to `bench/corpus.yaml` is a no-op rename if downstream Phase
21
+ Migration to `internal/bench/corpus.yaml` is a no-op rename if downstream Phase
22
22
  2 work proves the consolidation is worth the diff cost.
23
23
 
24
24
  ## Composition (25 prompts)
@@ -7,12 +7,12 @@ keep-beta-until: 2026-08-14
7
7
 
8
8
  Parser-visible contract for the JSON + Markdown reports emitted by
9
9
  [`scripts/bench_run.py`](../../scripts/bench_run.py). Every `task bench`
10
- run writes one `bench/reports/<ts>-<corpus_id>.json` + matching `.md`.
10
+ run writes one `internal/bench/reports/<ts>-<corpus_id>.json` + matching `.md`.
11
11
 
12
12
  ## File layout
13
13
 
14
14
  ```
15
- bench/
15
+ internal/bench/
16
16
  ├── pricing.yaml # per-1M model rates + sourced_on dates
17
17
  └── reports/
18
18
  ├── 2026-05-16T10-30-00Z-dev.json # machine-readable
@@ -60,7 +60,7 @@ cost:
60
60
  per_tier: # haiku / sonnet / opus / unknown
61
61
  sonnet: { messages: <int>, cost_usd: <float> }
62
62
  ...
63
- pricing_sourced_on: <ISO date from bench/pricing.yaml>
63
+ pricing_sourced_on: <ISO date from internal/bench/pricing.yaml>
64
64
  quality:
65
65
  source: <path-or-"not_collected">
66
66
  prompts_with_assertion: <int>
@@ -99,7 +99,7 @@ Headers in order:
99
99
  `quality.source: not_collected` and `verdict.overall: partial`. Score
100
100
  stays `0.0`; never inflate by assuming pass.
101
101
  - **Pricing dated.** Every cost row reads `sourced_on` from
102
- `bench/pricing.yaml`. Stale price (> 90 days) → warning line in the
102
+ `internal/bench/pricing.yaml`. Stale price (> 90 days) → warning line in the
103
103
  Markdown footer.
104
104
 
105
105
  ## Cross-references
@@ -107,5 +107,5 @@ Headers in order:
107
107
  - Runner — [`scripts/bench_run.py`](../../scripts/bench_run.py)
108
108
  - Baseline collector — [`scripts/bench_runner.py`](../../scripts/bench_runner.py)
109
109
  - Corpus contract — [`benchmark-corpus-spec.md`](benchmark-corpus-spec.md)
110
- - Pricing source — [`bench/pricing.yaml`](../../bench/pricing.yaml)
110
+ - Pricing source — [`internal/bench/pricing.yaml`](../../bench/pricing.yaml)
111
111
  - Cost session reader (live sessions) — [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs)
@@ -13,7 +13,7 @@ keep-beta-until: 2026-08-15
13
13
 
14
14
  | Key | Value | Provenance |
15
15
  |---|---|---|
16
- | `caveman_multiplier_version` | `v1` | Tied to `bench/reports/caveman-v1.{json,md}` |
16
+ | `caveman_multiplier_version` | `v1` | Tied to `internal/bench/reports/caveman-v1.{json,md}` |
17
17
  | `caveman_multiplier_value` | `0.9155` | `median(terse_control_tokens / compressed_tokens)` over the 10-prompt v1 corpus |
18
18
  | `caveman_multiplier_p10` | `0.4506` | 10th percentile (worst-case carve-out-tax prompts) |
19
19
  | `caveman_multiplier_p90` | `2.3664` | 90th percentile (pure-prose prompts where caveman wins) |
@@ -40,7 +40,7 @@ where `M = caveman_multiplier_value`.
40
40
 
41
41
  ## Why suspended after v1
42
42
 
43
- The `caveman-v1` bench (`bench/reports/caveman-v1.md`, 30 calls,
43
+ The `caveman-v1` bench (`internal/bench/reports/caveman-v1.md`, 30 calls,
44
44
  2026-05-16) found:
45
45
 
46
46
  - Median savings vs raw uncompressed: **+23.51 %** (inflated by the
@@ -78,6 +78,6 @@ Until a v2 bench (broader corpus or a re-tuned dialect) lifts the
78
78
  ## See also
79
79
 
80
80
  - [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) — the rule-default-flip gate; this multiplier is gated on the same `vs_terse` arm.
81
- - [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
82
- - [`bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
81
+ - [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
82
+ - [`internal/bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
83
83
  - [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md) — runtime rule the multiplier measures.
@@ -6,7 +6,7 @@ keep-beta-until: 2026-08-14
6
6
  # Compression default — kill-criterion
7
7
 
8
8
  > **Status:** v1-measured · criterion not met · default stays `off` · **Owner:** `step-16-caveman-substance.md`
9
- > Phase 1 closeout · **Sources:** [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
9
+ > Phase 1 closeout · **Sources:** [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
10
10
  > [`council-synthesis.md` § 7](../../agents/evidence/audits/2026-05-14-north-star/council-synthesis.md) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict --> ·
11
11
  > [`caveman-v1-kc-verdict.json`](../../agents/runtime/council/responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
12
12
 
@@ -23,14 +23,14 @@ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
23
23
  [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
24
24
  but the feature is non-promoted: no skill recommends turning it on,
25
25
  no preset enables it, no profile depends on it.
26
- 2. **Baselines.** Every published `bench/reports/caveman-v<N>.{json,md}`
26
+ 2. **Baselines.** Every published `internal/bench/reports/caveman-v<N>.{json,md}`
27
27
  measures three arms (`compressed` · `terse-control` ·
28
28
  `uncompressed`) and reports two savings columns:
29
29
  - `vs_raw` — median savings against the uncompressed arm.
30
30
  - `vs_terse` — **load-bearing** median savings against the
31
31
  `Answer concisely.` terse-control arm. `vs_raw` is inflated by the
32
32
  carve-out-tax-free pure-prose case and is **not** the gate metric.
33
- 3. **Decision table.** Read the latest `bench/reports/caveman-v<N>.md`
33
+ 3. **Decision table.** Read the latest `internal/bench/reports/caveman-v<N>.md`
34
34
  and apply exactly one of:
35
35
 
36
36
  | Measured `vs_terse` median | Quality regression on corpus | Verdict |
@@ -50,7 +50,7 @@ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
50
50
 
51
51
  ## v1 verdict (2026-05-16)
52
52
 
53
- [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
53
+ [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
54
54
  landed 30 calls · $0.0805 · 0 errors · `claude-sonnet-4-5`:
55
55
 
56
56
  | Metric | Median | p10 | p90 |
@@ -100,7 +100,7 @@ re-litigating compression on every PR.
100
100
 
101
101
  ## Cross-references
102
102
 
103
- - [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
103
+ - [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
104
104
  — v1 measurement; canonical baseline this doc cites.
105
105
  - [`docs/benchmarks.md`](../benchmarks.md)
106
106
  — cadence + when the next bench run is mandatory.
@@ -131,4 +131,4 @@ suite is wired to `task test-cost-budget` per `step-11` Phase 2 Step 5.
131
131
  - `step-11-ruflo-parity` — Measurement & Governance Parity roadmap.
132
132
  - `docs/contracts/cost-dashboard.md` — companion dashboard contract.
133
133
  - `scripts/cost/budget.mjs` — evaluator implementation.
134
- - `bench/pricing.yaml` — per-model USD pricing table.
134
+ - `internal/bench/pricing.yaml` — per-model USD pricing table.
@@ -7,7 +7,7 @@ mcp_scope: lite
7
7
 
8
8
  > **Status:** Active · governs the `experimental → beta` promotion for
9
9
  > the MCP surface (`scripts/mcp_server/` local stdio kernel + the
10
- > hosted `workers/mcp/` bridge). Owned by Phase 3 of the
10
+ > hosted `internal/workers/mcp/` bridge). Owned by Phase 3 of the
11
11
  > `road-to-surface-discipline` roadmap (see `agents/roadmaps/`).
12
12
  > Companion contract:
13
13
  > [`mcp-phase-1-scope.md`](mcp-phase-1-scope.md) (local) ·
@@ -5,7 +5,7 @@ mcp_scope: lite
5
5
 
6
6
  # MCP Server — Cloud Scope (A0-cloud Hard Contract)
7
7
 
8
- > **Status:** Active · covers `workers/mcp/` (TypeScript Cloudflare
8
+ > **Status:** Active · covers `internal/workers/mcp/` (TypeScript Cloudflare
9
9
  > Worker bridge), MVP-1 surface. Extends — does **not** supersede —
10
10
  > [`mcp-phase-1-scope.md`](mcp-phase-1-scope.md), which retains
11
11
  > exclusive ownership of `scripts/mcp_server/` (local stdio).
@@ -15,7 +15,7 @@ mcp_scope: lite
15
15
  ## Purpose
16
16
 
17
17
  Locks the **execution-safety boundary** for the hosted MCP Worker. Any
18
- code under `workers/mcp/` must satisfy this contract verbatim. The
18
+ code under `internal/workers/mcp/` must satisfy this contract verbatim. The
19
19
  local stdio kernel and the hosted Worker are two distinct surfaces; a
20
20
  deviation in one is **not** authorized by a precedent in the other.
21
21
 
@@ -46,7 +46,7 @@ prose-only.
46
46
  - **What it never does:** execute Python scripts, shell out, spawn
47
47
  runtimes, touch consumer FS, write to R2, mutate consumer state,
48
48
  call upstream LLM APIs, or read `.agent-src.uncompressed/`.
49
- - **Owner code path:** `workers/mcp/` (TypeScript, Cloudflare Worker).
49
+ - **Owner code path:** `internal/workers/mcp/` (TypeScript, Cloudflare Worker).
50
50
  This contract is the normative spec.
51
51
  - **Auth model:** `public` (default) or `bearer-auth` (operator opt-in)
52
52
  per `## Auth surface`. HMAC and CF Access are declared but deferred.
@@ -194,7 +194,7 @@ runtime mode switch.
194
194
  returns HTTP `401` with a JSON-RPC error envelope (code `-32001`,
195
195
  message `"Unauthorized"`) and the RFC 6750
196
196
  `WWW-Authenticate: Bearer realm="agent-config-mcp"` header.
197
- Implementation: `workers/mcp/src/index.ts` § auth gate (the
197
+ Implementation: `internal/workers/mcp/src/index.ts` § auth gate (the
198
198
  `if (requiredToken) { … }` block).
199
199
  - **Liveness carve-out:** the `GET /` liveness probe is
200
200
  unauthenticated by design — health checks and `curl` smoke tests
@@ -52,7 +52,7 @@
52
52
  "tools_count": {
53
53
  "type": "integer",
54
54
  "minimum": 0,
55
- "description": "Sourced from workers/mcp/content.json#/tool_catalog/tools. Hard-sourced — no fallback. R3 (discovery roadmap) is a hard prerequisite per the AI-Council external review."
55
+ "description": "Sourced from internal/workers/mcp/content.json#/tool_catalog/tools. Hard-sourced — no fallback. R3 (discovery roadmap) is a hard prerequisite per the AI-Council external review."
56
56
  },
57
57
  "install_hint_stdio": { "type": "string", "minLength": 1 }
58
58
  }
@@ -47,7 +47,7 @@ keep-beta-until: 2026-08-14
47
47
  ## Glossary
48
48
 
49
49
  - **Side-effect** — `ro` (read-only) · `fs-write` (filesystem write) · `shell` (spawns processes).
50
- - **Transports** — `stdio` (`scripts/mcp_server/`) · `worker` (`workers/mcp/`). A tool may live on both.
50
+ - **Transports** — `stdio` (`scripts/mcp_server/`) · `worker` (`internal/workers/mcp/`). A tool may live on both.
51
51
  - **Stub** — catalog-listed for discovery; returns the `not_implemented` envelope from
52
52
  [`mcp-tool-stub-envelope.md`](mcp-tool-stub-envelope.md) until promoted.
53
53
 
@@ -20,7 +20,7 @@ a 500.
20
20
  ## Source of truth
21
21
 
22
22
  `scripts/mcp_server/consumer_tool_catalog.json` (schema_version 1).
23
- Both the stdio server and the Cloud Worker bundle (`workers/mcp/`,
23
+ Both the stdio server and the Cloud Worker bundle (`internal/workers/mcp/`,
24
24
  packed by `scripts/pack_mcp_content.py`) read from this file. The
25
25
  manifest returned by `tools/list` is byte-identical apart from
26
26
  per-tool `implemented_on` metadata.
@@ -24,7 +24,7 @@ Four axes, all numeric, all reproducible from the same input:
24
24
 
25
25
  Schemas: [`benchmark-report-schema.md`](benchmark-report-schema.md) ·
26
26
  [`benchmark-corpus-spec.md`](benchmark-corpus-spec.md). Reports land at
27
- `bench/reports/<utc-stamp>-<corpus>[-projection].{json,md}` —
27
+ `internal/bench/reports/<utc-stamp>-<corpus>[-projection].{json,md}` —
28
28
  timestamped, never overwritten, content-addressed by run.
29
29
 
30
30
  ## Corpora — frozen for the soak window
@@ -67,10 +67,10 @@ NO ANECDOTE, NO INDIVIDUAL REPORT, NO ROADMAP-SIDE OVERRIDE.
67
67
  [`scripts/bench_baseline_ready.py`](../../scripts/bench_baseline_ready.py)
68
68
  returns exit 0 iff both:
69
69
 
70
- 1. **Wall-clock soak:** `today − bench/baseline-start.txt ≥ --min-days` (default 60)
71
- 2. **Report density:** `bench/reports/*-<corpus>.json` count ≥ `--min-reports` (default 30)
70
+ 1. **Wall-clock soak:** `today − internal/bench/baseline-start.txt ≥ --min-days` (default 60)
71
+ 2. **Report density:** `internal/bench/reports/*-<corpus>.json` count ≥ `--min-reports` (default 30)
72
72
 
73
- Soak start anchored at [`bench/baseline-start.txt`](../../bench/baseline-start.txt)
73
+ Soak start anchored at [`internal/bench/baseline-start.txt`](../../bench/baseline-start.txt)
74
74
  = **2026-05-16**. Earliest possible flip: **2026-07-15**, contingent
75
75
  on the 30-report floor.
76
76
 
@@ -86,11 +86,11 @@ On baseline closure, the step-4 closeout writes the numeric verdict to
86
86
  [`docs/parity/bench.json`](../parity/bench.json) — frozen snapshot with
87
87
  the 30+ reports averaged, drift verdict, and the compression-default
88
88
  decision per the kill-criterion table. That file is the artefact every
89
- P2 roadmap reads — not the live `bench/reports/` directory.
89
+ P2 roadmap reads — not the live `internal/bench/reports/` directory.
90
90
 
91
91
  ## Carve-outs
92
92
 
93
- - **Pricing freshness:** [`bench/pricing.yaml`](../../bench/pricing.yaml) rows must carry `sourced_on: YYYY-MM-DD`. Stale prices = stale numbers = no trust (ruflo "measured-vs-claimed" pattern).
93
+ - **Pricing freshness:** [`internal/bench/pricing.yaml`](../../bench/pricing.yaml) rows must carry `sourced_on: YYYY-MM-DD`. Stale prices = stale numbers = no trust (ruflo "measured-vs-claimed" pattern).
94
94
  - **Subjective grading excluded:** quality scoring is mechanical via `quality_assertion`. No vibes.
95
95
  - **Cursor / Cline / Windsurf:** rules-only surfaces, no SKILL.md projection. `bench:projection` reports them as `not_applicable` — the gap is acknowledged, not silently dropped.
96
96
 
@@ -0,0 +1,129 @@
1
+ ---
2
+ adr: 027
3
+ status: accepted
4
+ date: 2026-05-25
5
+ decision: changelog-machine-vs-manual
6
+ supersedes: —
7
+ superseded_by: —
8
+ phase: v3.x · changelog-era-auto-split Phase 4
9
+ type: discovery-loop-closure
10
+ review_date: 2027-05-25
11
+ ---
12
+
13
+ # ADR-027 — CHANGELOG convention — confirm manual narrative + auto-split for 12 months
14
+
15
+ ## Status
16
+
17
+ **Accepted** · 2026-05-25. Closes Phase 4 of
18
+ [`road-to-changelog-era-auto-split.md`](../../agents/roadmaps/archive/road-to-changelog-era-auto-split.md).
19
+ Time-boxed: review on 2027-05-25 or earlier if a trigger below fires.
20
+
21
+ ## Context
22
+
23
+ The AI Council that produced the auto-split design surfaced one upstream
24
+ question: **is the manual changelog convention itself the right primitive
25
+ for a package that bills itself as an "Universal AI Agent OS"?** Phase 4
26
+ investigates without committing to a rewrite — either confirm
27
+ `docs/contracts/CHANGELOG-conventions.md` + the Tier 2/3 machinery stand
28
+ for 12 months, or draft a successor roadmap.
29
+
30
+ ### Step 1 — Consumer touchpoint audit
31
+
32
+ Every place a consumer reads the changelog and which fields they read:
33
+
34
+ | # | Surface | What the consumer sees | Field weight |
35
+ |---|---|---|---|
36
+ | 1 | **GitHub Release notes** | `plan.changelog_body` rendered via `scripts/release.py:818` | narrative + bullets + tests-delta |
37
+ | 2 | **npmjs.com package page** | auto-rendered `CHANGELOG.md` (top of file) | narrative paragraph dominates the fold |
38
+ | 3 | **packagist.org package page** | same — auto-rendered `CHANGELOG.md` | same |
39
+ | 4 | **`README.md` footer** | link only to `CHANGELOG.md` and `releases/latest` | navigation, no content |
40
+ | 5 | **`CHANGELOG.md` direct** | full structured entries | full Keep-a-Changelog shape |
41
+ | 6 | **`docs/archive/CHANGELOG-pre-*.md`** | historical eras behind the active-era pointer | follow-up reads only |
42
+ | 7 | **`agents/settings/contexts/adr-artifact-engagement.md` § L100** | guidance to write a deprecation note in `CHANGELOG.md` | governance / authoring, not consumer |
43
+
44
+ `docs/getting-started-by-role.md` — not in scope. No role-pack `FIRST_WIN.md` references the changelog.
45
+
46
+ **Reader insight.** Surfaces 1–3 are the high-traffic consumer surfaces.
47
+ Each renders the *top* of the changelog: the narrative paragraph + the
48
+ first bullet group + the tests-delta. The narrative paragraph carries
49
+ the framing every other layer (bullets, compare-link) loses.
50
+
51
+ ### Step 2 — Spike comparison against the audit
52
+
53
+ Three shapes, measured against the 3.2.0 release (38 commits, 141 lines
54
+ of narrative + bullets in the current convention):
55
+
56
+ | Metric | Current (manual narrative + auto-split) | (a) `release-please` fully generated | (b) Hybrid (manual paragraph + generated bullets) |
57
+ |---|---:|---:|---:|
58
+ | **Token budget per release** | ~2.8k (141 lines × ~80 chars) | ~1.1k (38 commits × ~120 chars) | ~2.6k (same as current; bullets generated, paragraph hand-written) |
59
+ | **Maintainer minutes per release** | 5–15 (write narrative; auto-split runs in `task release`) | 0 direct + ~30/month policing commit messages | 5–10 (paragraph only) |
60
+ | **Information density per line** | high — narrative compresses 5–10 bullets of context | low — every infrastructure commit becomes a line | high — same as current |
61
+ | **Parseability for downstream agents** | high — `### Features` / `### Fixes` / `### BREAKING` are semantic anchors | medium — same headings, but no narrative anchor to disambiguate scope churn | high |
62
+ | **Hands-off failure mode** | era over-cap → auto-split fires | commit-message drift pollutes the log → no recovery without rewriting commits | narrative drift → degrades to release-please equivalent |
63
+
64
+ The fully-generated shape **loses the narrative paragraph**, which is
65
+ the field weight #1–3 surfaces lean on. The hybrid shape is what the
66
+ current convention *already* allows — `release-please` is just the
67
+ extreme end of the auto-spectrum; the convention sits at a defensible
68
+ middle.
69
+
70
+ The operational cost that prompted the question (era over-cap blocking
71
+ the release PR) is solved by Tier 2 in the same roadmap. The discovery
72
+ question therefore reduces to: **is the narrative paragraph worth
73
+ ~2.5k tokens per release?** Audit says yes — it is the field consumers
74
+ read first and the field that differentiates the package from bot
75
+ output.
76
+
77
+ ## Decision
78
+
79
+ **Confirm `docs/contracts/CHANGELOG-conventions.md` + Tier 2/3
80
+ machinery for 12 months.** No successor roadmap.
81
+
82
+ The current convention:
83
+
84
+ - Keeps the **hand-written narrative paragraph** as the load-bearing
85
+ framing for consumer surfaces (npm, packagist, GitHub Releases).
86
+ - Lets `scripts/release.py` generate the **bullet list** under
87
+ Features / Fixes / Chores / Docs / BREAKING via the auto-split flow
88
+ plus the existing Keep-a-Changelog shape.
89
+ - Lets the **drift gate** (`tests/test_changelog_eras.py`) catch
90
+ non-release edits that grow the active era past 250 lines.
91
+
92
+ The "Universal AI Agent OS" framing **argues for**, not against, a
93
+ human-curated changelog — agents that consume `CHANGELOG.md` as a
94
+ ground-truth of "what changed" prefer structured, semantic, framed
95
+ entries over a flat commit dump.
96
+
97
+ ## Consequences
98
+
99
+ - `docs/contracts/CHANGELOG-conventions.md` stands as-is. The
100
+ Tier 3 Step 3 "Gate-vs-script contract" subsection is the canonical
101
+ reference for the gate / script split.
102
+ - No new tooling lands as a Phase 4 follow-up.
103
+ - Review on **2027-05-25** or earlier if any trigger fires:
104
+ 1. The narrative paragraph stops being written for two consecutive
105
+ releases (signal: convention is breaking down naturally).
106
+ 2. The active-era body grows past 250 lines from non-release edits
107
+ more than twice in a quarter (signal: humans are bypassing the
108
+ gate; auto-split is no longer the right primitive).
109
+ 3. A downstream consumer (npm/packagist/GitHub Release renderer)
110
+ changes how it slices the top-of-file (signal: the field weight
111
+ assumption above is invalidated).
112
+ 4. Council session re-opens the question with new evidence.
113
+
114
+ ## Alternatives considered
115
+
116
+ | Option | Why rejected |
117
+ |---|---|
118
+ | **`release-please` fully generated** | Loses the narrative paragraph; surfaces 1–3 lose their highest-weight field. |
119
+ | **Hybrid (manual paragraph + generated bullets, separate tool)** | Indistinguishable from current convention in output; adds a tool boundary without changing the artefact. |
120
+ | **Drop the changelog entirely, point to GitHub Releases** | npm / packagist auto-render `CHANGELOG.md` from the repo; deleting it degrades surfaces 2–3 to a generic placeholder. |
121
+
122
+ ## References
123
+
124
+ - [`docs/contracts/CHANGELOG-conventions.md`](../contracts/CHANGELOG-conventions.md) — convention being confirmed.
125
+ - [`agents/roadmaps/archive/road-to-changelog-era-auto-split.md`](../../agents/roadmaps/archive/road-to-changelog-era-auto-split.md) — closes Phase 4.
126
+ - [`agents/runtime/council/responses/changelog-era-split-2026-05-25.json`](../../agents/runtime/council/responses/changelog-era-split-2026-05-25.json) — the originating council synthesis.
127
+ - `scripts/_lib/changelog_eras.py` — shared cap + splitter (Tier 2 output).
128
+ - `scripts/release.py:818` — `plan.changelog_body` → GitHub Release notes wire.
129
+ - `tests/test_changelog_eras.py`, `tests/test_changelog_split.py` — gate + splitter coverage.