npm - @event4u/agent-config - Versions diffs - 3.2.0 → 3.3.0 - Mend

@event4u/agent-config 3.2.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

package/.agent-src/commands/agent-status.md +1 -1
package/.agent-src/skills/compress-memory/SKILL.md +1 -1
package/.claude-plugin/marketplace.json +1 -1
package/AGENTS.md +5 -4
package/CHANGELOG.md +24 -0
package/dist/discovery/deprecation-report.md +1 -1
package/dist/discovery/discovery-manifest.json +4 -4
package/dist/discovery/discovery-manifest.json.sha256 +1 -1
package/dist/discovery/discovery-manifest.summary.md +1 -1
package/dist/discovery/orphan-report.md +1 -1
package/dist/discovery/packs.json +2 -2
package/dist/discovery/trust-report.md +1 -1
package/dist/discovery/workspaces.json +2 -2
package/dist/mcp/registry-manifest.json +1 -1
package/docs/benchmarks.md +4 -4
package/docs/contracts/CHANGELOG-conventions.md +1 -1
package/docs/contracts/adr-mcp-runtime.md +1 -1
package/docs/contracts/benchmark-corpus-spec.md +3 -3
package/docs/contracts/benchmark-report-schema.md +5 -5
package/docs/contracts/caveman-telemetry.md +4 -4
package/docs/contracts/compression-default-kill-criterion.md +5 -5
package/docs/contracts/cost-enforcement.md +1 -1
package/docs/contracts/mcp-beta-criteria.md +1 -1
package/docs/contracts/mcp-cloud-scope.md +4 -4
package/docs/contracts/mcp-registry-manifest.schema.json +1 -1
package/docs/contracts/mcp-tool-inventory.md +1 -1
package/docs/contracts/mcp-tool-stub-envelope.md +1 -1
package/docs/contracts/measurement-baseline.md +6 -6
package/docs/decisions/ADR-027-changelog-machine-vs-manual.md +129 -0
package/docs/decisions/ADR-028-root-layout.md +147 -0
package/docs/decisions/ADR-029-multi-workspace-deferred.md +122 -0
package/docs/decisions/INDEX.md +8 -0
package/docs/mcp-server.md +1 -1
package/docs/parity/bench-ruflo.json +3 -3
package/docs/parity/ruflo.md +1 -1
package/docs/setup/mcp-client-config.md +1 -1
package/docs/setup/mcp-cloud-endpoints.md +1 -1
package/docs/setup/mcp-cloud-setup.md +2 -2
package/docs/setup/mcp-r2-bootstrap.md +1 -1
package/package.json +1 -1
package/scripts/__pycache__/validate_frontmatter.cpython-312.pyc +0 -0
package/scripts/_lib/__pycache__/__init__.cpython-312.pyc +0 -0
package/scripts/_lib/__pycache__/agent_src.cpython-312.pyc +0 -0
package/scripts/_lib/bench_caveman.py +2 -2
package/scripts/_lib/bench_caveman_report.py +1 -1
package/scripts/_lib/bench_cost.py +2 -2
package/scripts/_lib/bench_report.py +2 -2
package/scripts/audit_mcp_tools.py +1 -1
package/scripts/bench_baseline_ready.py +3 -3
package/scripts/bench_compress_memory.py +4 -4
package/scripts/bench_drift_check.py +2 -2
package/scripts/bench_per_tool.py +2 -2
package/scripts/bench_run.py +4 -4
package/scripts/build_mcp_registry_manifest.py +2 -2
package/scripts/mcp_server/__init__.py +1 -1
package/scripts/mcp_server/catalog.py +1 -1
package/scripts/mcp_server/consumer_tool_catalog.json +1 -1
package/scripts/mcp_server/tools.py +1 -1
package/scripts/pack_mcp_content.py +6 -6
package/scripts/skill_trigger_eval.py +2 -2

package/.agent-src/commands/agent-status.md CHANGED Viewed

@@ -66,7 +66,7 @@ Extract from latest record:
 - `by_model[]` — per-tier (haiku / sonnet / opus) input / output / cache split
 - `budget.tier` — `under` / `50` / `75` / `90` / `100` (from `node scripts/cost/budget.mjs check`)
-Pricing source: [`bench/pricing.yaml`](../../bench/pricing.yaml). Reader
+Pricing source: [`internal/bench/pricing.yaml`](../../bench/pricing.yaml). Reader
 implementation: [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs).
 ### 3b. Read caveman delta + per-conversation cost lens

package/.agent-src/skills/compress-memory/SKILL.md CHANGED Viewed

@@ -23,7 +23,7 @@ install:
 # compress-memory
-> **Experimental.** Output-side caveman dialect did not meet kill-criterion in [`bench/reports/caveman-v1.md`](../../../bench/reports/caveman-v1.md) (`vs_terse` median −9.27 %). Input-side memory compression is orthogonal use case: savings target always-loaded memory budget, not reply stream. Treat ship-criterion as **per-target measurement**, not v1 verdict.
+> **Experimental.** Output-side caveman dialect did not meet kill-criterion in [`internal/bench/reports/caveman-v1.md`](../../../bench/reports/caveman-v1.md) (`vs_terse` median −9.27 %). Input-side memory compression is orthogonal use case: savings target always-loaded memory budget, not reply stream. Treat ship-criterion as **per-target measurement**, not v1 verdict.
 ## When to use

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -6,7 +6,7 @@
   },
   "metadata": {
     "description": "Shared agent configuration \u2014 skills for AI coding tools (Claude Code, Augment, Cursor, Cline, Windsurf, Gemini CLI).",
-    "version": "3.2.0",
+    "version": "3.3.0",
     "keywords": [
       "agent-config",
       "skills",

package/AGENTS.md CHANGED Viewed

@@ -4,7 +4,7 @@
 ## Source of truth
-Edit `packages/<pack>/.agent-src.uncompressed/` only. Generated trees (`.agent-src/`, `.augment/`, `.claude/`, `.cursor/`, `.clinerules/`, `.windsurfrules`) regenerate from `task sync` + `task generate-tools`; never hand-edit.
+Edit `packages/<pack>/.agent-src.uncompressed/` only. Generated trees regenerate from `task sync` + `task generate-tools`; never hand-edit.
 ## Working on this repo
@@ -20,11 +20,12 @@ task ci                # full pipeline — green before PR
 - **Package self-orientation** — identity, cognition map, layout, stack, key rules, telemetry: [`package-self-orientation`](docs/contracts/package-self-orientation.md).
 - **Kernel + Router** — 9 Iron-Law rules, tier-1/2 routing, cost profiles, per-rule caps: [`kernel-membership`](docs/contracts/kernel-membership.md) + [`rule-router`](docs/contracts/rule-router.md).
 - **Trust & Safety** — trust-level enum, HRR banner, safety floors, installer confirm: [`trust-and-safety`](docs/contracts/trust-and-safety.md) + [`ADR-018`](docs/decisions/ADR-018-trust-and-safety-layer.md).
-- **Content pipelines** — A→D source / Augment / multi-tool / Claude.ai-bundle projections, indexed at [`docs/architecture.md`](docs/architecture.md); sub-pipelines under [`docs/architecture/`](docs/architecture/).
-- **Editing this repo** — Iron-Law rules + Thin-Root contract govern every change: [`augment-source-of-truth`](.agent-src/rules/augment-source-of-truth.md) + [`agents-md-thin-root`](.agent-src/skills/agents-md-thin-root/SKILL.md).
+- **Content pipelines** — A→D source / Augment / multi-tool / Claude.ai-bundle projections indexed at [`docs/architecture.md`](docs/architecture.md).
+- **Editing this repo** — Iron-Law rules + Thin-Root contract govern every edit: [`augment-source-of-truth`](.agent-src/rules/augment-source-of-truth.md) + [`agents-md-thin-root`](.agent-src/skills/agents-md-thin-root/SKILL.md).
 - **Consumer story** — `npx` + `scripts/install.sh` opt-in flags, sandbox / offline install paths, verified-offline manifest: [`README.md`](README.md).
 - **Personas** — 11 review-lens cast (6 core · 5 specialist), `personas:` vs `/mode`: [`docs/personas.md`](docs/personas.md).
-- **Discovery** — workspaces / packs / dist manifest; contract [`ADR-013`](docs/decisions/ADR-013-discovery-frontmatter-contract.md), how-to [`customization § Workspaces & packs`](docs/customization.md#workspaces--packs-discovery).
+- **Discovery** — workspaces / packs / dist manifest: [`ADR-013`](docs/decisions/ADR-013-discovery-frontmatter-contract.md) + [`customization`](docs/customization.md#workspaces--packs-discovery).
+- **Root layout** — maintainer-only dirs (`bench`, `evals`, `workers`) live under [`internal/`](internal/README.md) per [`ADR-028`](docs/decisions/ADR-028-root-layout.md).
 ## Emergency triage — read this when nothing else is reachable

package/CHANGELOG.md CHANGED Viewed

@@ -738,6 +738,30 @@ our recommendation order, not its support status.
 > that forces a new era split (`# Era: 3.3.x`, etc.) — see
 > [`docs/contracts/CHANGELOG-conventions.md § Era splits`](docs/contracts/CHANGELOG-conventions.md).
+## [3.3.0](https://github.com/event4u-app/agent-config/compare/3.2.0...3.3.0) (2026-05-25)
+### Features
+* **scripts:** point bench / eval / mcp tooling at internal/ (ADR-028 phase 1) ([c20b515](https://github.com/event4u-app/agent-config/commit/c20b515dfc80d03244e54af7b50e07d562647799))
+### Bug Fixes
+* **ci:** repoint bench/ -> internal/bench/ in projected agent-status + compress-memory ([682040b](https://github.com/event4u-app/agent-config/commit/682040b230248bd054767ab10bd1e90ed3a1d0b7))
+### Documentation
+* **roadmap:** archive road-to-root-layout-cleanup at 100% complete ([948f1a8](https://github.com/event4u-app/agent-config/commit/948f1a8cd9ec63fa831d5abab25aeb3657a0a7d4))
+* **adr:** land Phase 2 audit evidence + ADR-029 multi-workspace deferral ([c4dabf1](https://github.com/event4u-app/agent-config/commit/c4dabf1f718fee0d4709d727d1e3eb781f8db62f))
+* **adr:** land ADR-028 + root-layout-cleanup roadmap ([7b0fd28](https://github.com/event4u-app/agent-config/commit/7b0fd281288a87f9c218d9fa9d73f01a48a56626))
+* repoint bench / mcp-worker paths to internal/ (ADR-028 phase 1) ([85278dd](https://github.com/event4u-app/agent-config/commit/85278dd831d44a187dcfe7684e895bac0ca97ae4))
+### Chores
+* **root:** move bench/ and workers/ under internal/ (ADR-028 phase 1) ([2a1dd1e](https://github.com/event4u-app/agent-config/commit/2a1dd1e363f9aabb59b6eebf35f64a47609abebb))
+* **roadmap:** close changelog-era-auto-split (Phase 4 → ADR-027) ([b10b326](https://github.com/event4u-app/agent-config/commit/b10b32623b42739f0eba12a52173b810169929b6))
+Tests: 4938 (+9 since 3.2.0)
 ## [3.2.0](https://github.com/event4u-app/agent-config/compare/3.1.1...3.2.0) (2026-05-25)
 ### Features

package/dist/discovery/deprecation-report.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Discovery — Deprecation Report
-- Generated: `2026-05-25T09:38:22Z`
+- Generated: `2026-05-25T12:26:32Z`
 - Deprecated artefacts: **0**
 _None. Tree is clean._

package/dist/discovery/discovery-manifest.json CHANGED Viewed

@@ -24,7 +24,7 @@
     },
     {
       "category": "command",
-      "checksum": "sha256:c31aacf3434e5c983ba2b53a07febd6fa77acee43c53935d1a1c339b5a8daff1",
+      "checksum": "sha256:756535431a0d4d7049960e137c022cad35cb5221fc459fa98e146666a50649ce",
       "install": {
         "default": true,
         "removable": false
@@ -4906,7 +4906,7 @@
     },
     {
       "category": "skill",
-      "checksum": "sha256:24843ebe0c69e9c62080169b4ae6487b37c68250a13fd099695377aa0eaf6373",
+      "checksum": "sha256:4ff26cf1af1b252755b72829247e39f85f574d7453133fc5d96c1aaf20f63d97",
       "install": {
         "default": true,
         "removable": false
@@ -9387,7 +9387,7 @@
       ]
     }
   ],
-  "checksum": "sha256:a47138f513cdd4d1d024f20d235b6f77f881b380254c13f71613d383a9eed62d",
+  "checksum": "sha256:fd6102dc4515859fe55d3e85c8ad839c19bb5d8b1e579a5e8defcb137ecd3993",
   "documented_unassigned": [
     {
       "category": "template",
@@ -9500,7 +9500,7 @@
       "reason": "scaffold for new SKILL.md authoring"
     }
   ],
-  "generated_at": "2026-05-25T09:38:22Z",
+  "generated_at": "2026-05-25T12:26:32Z",
   "packs": [
     {
       "artefact_count": 84,

package/dist/discovery/discovery-manifest.json.sha256 CHANGED Viewed

	@@ -1 +1 @@
1	- ~~076a70a63686c4548465a7244ec34f6b251867115e7f3f9e978579329f2b1dda~~ discovery-manifest.json
1	+ 6c11926a0bf485b24906979f2dc1bdc569985a46008247c567726daef64cf009 discovery-manifest.json

package/dist/discovery/discovery-manifest.summary.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Discovery Manifest — Summary
-- Generated: `2026-05-25T09:38:22Z`
+- Generated: `2026-05-25T12:26:32Z`
 - Scanner: `80c8c0b8b827`
 - Artefacts: **430**
 - Unassigned: **0**

package/dist/discovery/orphan-report.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Discovery — Orphan Report
-- Generated: `2026-05-25T09:38:22Z`
+- Generated: `2026-05-25T12:26:32Z`
 - Orphan artefacts: **0**
 > An orphan is an artefact whose declared pack has no other members.

package/dist/discovery/packs.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
-  "checksum": "sha256:a47138f513cdd4d1d024f20d235b6f77f881b380254c13f71613d383a9eed62d",
-  "generated_at": "2026-05-25T09:38:22Z",
+  "checksum": "sha256:fd6102dc4515859fe55d3e85c8ad839c19bb5d8b1e579a5e8defcb137ecd3993",
+  "generated_at": "2026-05-25T12:26:32Z",
   "packs": [
     {
       "artefact_count": 84,

package/dist/discovery/trust-report.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Discovery — Trust Report
-- Generated: `2026-05-25T09:38:22Z`
+- Generated: `2026-05-25T12:26:32Z`
 - Workspaces tracked: **8**
 - Human-review-required artefacts: **2**

package/dist/discovery/workspaces.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
-  "checksum": "sha256:a47138f513cdd4d1d024f20d235b6f77f881b380254c13f71613d383a9eed62d",
-  "generated_at": "2026-05-25T09:38:22Z",
+  "checksum": "sha256:fd6102dc4515859fe55d3e85c8ad839c19bb5d8b1e579a5e8defcb137ecd3993",
+  "generated_at": "2026-05-25T12:26:32Z",
   "scanner_version": "80c8c0b8b827",
   "workspaces": [
     {

package/dist/mcp/registry-manifest.json CHANGED Viewed

@@ -9,7 +9,7 @@
     "homepage": "https://github.com/event4u-app/agent-config#readme",
     "name": "@event4u/agent-config",
     "repository": "https://github.com/event4u-app/agent-config",
-    "version": "3.2.0"
+    "version": "3.3.0"
   },
   "registries": [
     {

package/docs/benchmarks.md CHANGED Viewed

@@ -18,13 +18,13 @@ discipline (upstream `5b71c7a`).
 | Corpus | Path | Purpose |
 |---|---|---|
 | `dev` | `tests/eval/corpus-dev.yaml` | router / engine selection |
-| `caveman` | `bench/corpora/caveman/prompts.yaml` | compression dialect (`vs_raw` + `vs_terse`) |
+| `caveman` | `internal/bench/corpora/caveman/prompts.yaml` | compression dialect (`vs_raw` + `vs_terse`) |
 ## Reports — naming and trail
-- **Canonical pointer:** `bench/reports/<corpus>-v<N>.{json,md}` — always
+- **Canonical pointer:** `internal/bench/reports/<corpus>-v<N>.{json,md}` — always
   reflects the latest published run for that corpus version.
-- **Timestamped trail:** `bench/reports/<ISO-Zulu>-<corpus>-v<N>.{json,md}`
+- **Timestamped trail:** `internal/bench/reports/<ISO-Zulu>-<corpus>-v<N>.{json,md}`
   — every committed run keeps an immutable history copy alongside.
 Both are produced in one `scripts/bench_run.py` invocation; do not commit
@@ -37,7 +37,7 @@ one without the other.
 | Pre-release bake (any `vX.Y.0`) | `dev` + `caveman` | both reports refreshed |
 | Edit to `.agent-src.uncompressed/rules/caveman-speak.md` | `caveman` | report refreshed in same PR |
 | Edit to `scripts/bench_run.py` `--caveman` arm | `caveman` | report refreshed in same PR |
-| Edit to `bench/corpora/caveman/prompts.yaml` | `caveman` | report refreshed, version bumped (`caveman-vN+1`) |
+| Edit to `internal/bench/corpora/caveman/prompts.yaml` | `caveman` | report refreshed, version bumped (`caveman-vN+1`) |
 | Edit to `scripts/_lib/bench_caveman*.py` | `caveman` | report refreshed in same PR |
 A PR that touches any of the cadence triggers without refreshing the

package/docs/contracts/CHANGELOG-conventions.md CHANGED Viewed

@@ -52,7 +52,7 @@ Tests: NNNN (+M since X.Y.(Z-1))
 ```
 The test-count line is enforced for any release that ships changes to
-`scripts/`, `workers/`, or `.agent-src/` content; it can be omitted for
+`scripts/`, `internal/workers/`, or `.agent-src/` content; it can be omitted for
 pure-docs releases.
 ## What counts as breaking

package/docs/contracts/adr-mcp-runtime.md CHANGED Viewed

@@ -17,7 +17,7 @@ per [`scripts/mcp_server/requirements.txt`](../../scripts/mcp_server/requirement
 **FastMCP** (the higher-level decorator wrapper) and the **MCP TypeScript SDK**
 are explicitly rejected for this surface.
-The hosted Cloudflare Worker bridge (`workers/mcp/`) is the only place a
+The hosted Cloudflare Worker bridge (`internal/workers/mcp/`) is the only place a
 non-Python runtime is allowed, and it stays bound to the same wire contract
 (see [`mcp-cloud-scope.md`](mcp-cloud-scope.md)).

package/docs/contracts/benchmark-corpus-spec.md CHANGED Viewed

@@ -13,12 +13,12 @@ and validation invariants.
 ## Path decision
 Roadmap `step-4-measurement-and-benchmark.md`
-Phase 1 Step 2 names `bench/corpus.yaml`. The existing benchmark
+Phase 1 Step 2 names `internal/bench/corpus.yaml`. The existing benchmark
 infrastructure (runner + non-dev corpus + `task bench`) lives under
 `tests/eval/` and `scripts/bench_runner.py` hardcodes that directory.
-**Canonical location:** `tests/eval/corpus-<id>.yaml`. The `bench/`
+**Canonical location:** `tests/eval/corpus-<id>.yaml`. The `internal/bench/`
 directory is reserved for **reports + pricing** (Phase 2 deliverables).
-Migration to `bench/corpus.yaml` is a no-op rename if downstream Phase
+Migration to `internal/bench/corpus.yaml` is a no-op rename if downstream Phase
 2 work proves the consolidation is worth the diff cost.
 ## Composition (25 prompts)

package/docs/contracts/benchmark-report-schema.md CHANGED Viewed

@@ -7,12 +7,12 @@ keep-beta-until: 2026-08-14
 Parser-visible contract for the JSON + Markdown reports emitted by
 [`scripts/bench_run.py`](../../scripts/bench_run.py). Every `task bench`
-run writes one `bench/reports/<ts>-<corpus_id>.json` + matching `.md`.
+run writes one `internal/bench/reports/<ts>-<corpus_id>.json` + matching `.md`.
 ## File layout
 ```
-bench/
+internal/bench/
 ├── pricing.yaml                       # per-1M model rates + sourced_on dates
 └── reports/
     ├── 2026-05-16T10-30-00Z-dev.json  # machine-readable
@@ -60,7 +60,7 @@ cost:
   per_tier:                                        # haiku / sonnet / opus / unknown
     sonnet: { messages: <int>, cost_usd: <float> }
     ...
-  pricing_sourced_on: <ISO date from bench/pricing.yaml>
+  pricing_sourced_on: <ISO date from internal/bench/pricing.yaml>
 quality:
   source: <path-or-"not_collected">
   prompts_with_assertion: <int>
@@ -99,7 +99,7 @@ Headers in order:
   `quality.source: not_collected` and `verdict.overall: partial`. Score
   stays `0.0`; never inflate by assuming pass.
 - **Pricing dated.** Every cost row reads `sourced_on` from
-  `bench/pricing.yaml`. Stale price (> 90 days) → warning line in the
+  `internal/bench/pricing.yaml`. Stale price (> 90 days) → warning line in the
   Markdown footer.
 ## Cross-references
@@ -107,5 +107,5 @@ Headers in order:
 - Runner — [`scripts/bench_run.py`](../../scripts/bench_run.py)
 - Baseline collector — [`scripts/bench_runner.py`](../../scripts/bench_runner.py)
 - Corpus contract — [`benchmark-corpus-spec.md`](benchmark-corpus-spec.md)
-- Pricing source — [`bench/pricing.yaml`](../../bench/pricing.yaml)
+- Pricing source — [`internal/bench/pricing.yaml`](../../bench/pricing.yaml)
 - Cost session reader (live sessions) — [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs)

package/docs/contracts/caveman-telemetry.md CHANGED Viewed

@@ -13,7 +13,7 @@ keep-beta-until: 2026-08-15
 | Key | Value | Provenance |
 |---|---|---|
-| `caveman_multiplier_version` | `v1` | Tied to `bench/reports/caveman-v1.{json,md}` |
+| `caveman_multiplier_version` | `v1` | Tied to `internal/bench/reports/caveman-v1.{json,md}` |
 | `caveman_multiplier_value` | `0.9155` | `median(terse_control_tokens / compressed_tokens)` over the 10-prompt v1 corpus |
 | `caveman_multiplier_p10` | `0.4506` | 10th percentile (worst-case carve-out-tax prompts) |
 | `caveman_multiplier_p90` | `2.3664` | 90th percentile (pure-prose prompts where caveman wins) |
@@ -40,7 +40,7 @@ where `M = caveman_multiplier_value`.
 ## Why suspended after v1
-The `caveman-v1` bench (`bench/reports/caveman-v1.md`, 30 calls,
+The `caveman-v1` bench (`internal/bench/reports/caveman-v1.md`, 30 calls,
 2026-05-16) found:
 - Median savings vs raw uncompressed: **+23.51 %** (inflated by the
@@ -78,6 +78,6 @@ Until a v2 bench (broader corpus or a re-tuned dialect) lifts the
 ## See also
 - [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) — the rule-default-flip gate; this multiplier is gated on the same `vs_terse` arm.
-- [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
-- [`bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
+- [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
+- [`internal/bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
 - [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md) — runtime rule the multiplier measures.

package/docs/contracts/compression-default-kill-criterion.md CHANGED Viewed

@@ -6,7 +6,7 @@ keep-beta-until: 2026-08-14
 # Compression default — kill-criterion
 > **Status:** v1-measured · criterion not met · default stays `off` · **Owner:** `step-16-caveman-substance.md`
-> Phase 1 closeout · **Sources:** [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
+> Phase 1 closeout · **Sources:** [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
 > [`council-synthesis.md` § 7](../../agents/evidence/audits/2026-05-14-north-star/council-synthesis.md) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict --> ·
 > [`caveman-v1-kc-verdict.json`](../../agents/runtime/council/responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
@@ -23,14 +23,14 @@ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
    [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
    but the feature is non-promoted: no skill recommends turning it on,
    no preset enables it, no profile depends on it.
-2. **Baselines.** Every published `bench/reports/caveman-v<N>.{json,md}`
+2. **Baselines.** Every published `internal/bench/reports/caveman-v<N>.{json,md}`
    measures three arms (`compressed` · `terse-control` ·
    `uncompressed`) and reports two savings columns:
    - `vs_raw` — median savings against the uncompressed arm.
    - `vs_terse` — **load-bearing** median savings against the
      `Answer concisely.` terse-control arm. `vs_raw` is inflated by the
      carve-out-tax-free pure-prose case and is **not** the gate metric.
-3. **Decision table.** Read the latest `bench/reports/caveman-v<N>.md`
+3. **Decision table.** Read the latest `internal/bench/reports/caveman-v<N>.md`
    and apply exactly one of:
    | Measured `vs_terse` median | Quality regression on corpus | Verdict |
@@ -50,7 +50,7 @@ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
 ## v1 verdict (2026-05-16)
-[`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
+[`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
 landed 30 calls · $0.0805 · 0 errors · `claude-sonnet-4-5`:
 | Metric | Median | p10 | p90 |
@@ -100,7 +100,7 @@ re-litigating compression on every PR.
 ## Cross-references
-- [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
+- [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
   — v1 measurement; canonical baseline this doc cites.
 - [`docs/benchmarks.md`](../benchmarks.md)
   — cadence + when the next bench run is mandatory.

package/docs/contracts/cost-enforcement.md CHANGED Viewed

@@ -131,4 +131,4 @@ suite is wired to `task test-cost-budget` per `step-11` Phase 2 Step 5.
 - `step-11-ruflo-parity` — Measurement & Governance Parity roadmap.
 - `docs/contracts/cost-dashboard.md` — companion dashboard contract.
 - `scripts/cost/budget.mjs` — evaluator implementation.
-- `bench/pricing.yaml` — per-model USD pricing table.
+- `internal/bench/pricing.yaml` — per-model USD pricing table.

package/docs/contracts/mcp-beta-criteria.md CHANGED Viewed

@@ -7,7 +7,7 @@ mcp_scope: lite
 > **Status:** Active · governs the `experimental → beta` promotion for
 > the MCP surface (`scripts/mcp_server/` local stdio kernel + the
-> hosted `workers/mcp/` bridge). Owned by Phase 3 of the
+> hosted `internal/workers/mcp/` bridge). Owned by Phase 3 of the
 > `road-to-surface-discipline` roadmap (see `agents/roadmaps/`).
 > Companion contract:
 > [`mcp-phase-1-scope.md`](mcp-phase-1-scope.md) (local) ·

package/docs/contracts/mcp-cloud-scope.md CHANGED Viewed

@@ -5,7 +5,7 @@ mcp_scope: lite
 # MCP Server — Cloud Scope (A0-cloud Hard Contract)
-> **Status:** Active · covers `workers/mcp/` (TypeScript Cloudflare
+> **Status:** Active · covers `internal/workers/mcp/` (TypeScript Cloudflare
 > Worker bridge), MVP-1 surface. Extends — does **not** supersede —
 > [`mcp-phase-1-scope.md`](mcp-phase-1-scope.md), which retains
 > exclusive ownership of `scripts/mcp_server/` (local stdio).
@@ -15,7 +15,7 @@ mcp_scope: lite
 ## Purpose
 Locks the **execution-safety boundary** for the hosted MCP Worker. Any
-code under `workers/mcp/` must satisfy this contract verbatim. The
+code under `internal/workers/mcp/` must satisfy this contract verbatim. The
 local stdio kernel and the hosted Worker are two distinct surfaces; a
 deviation in one is **not** authorized by a precedent in the other.
@@ -46,7 +46,7 @@ prose-only.
 - **What it never does:** execute Python scripts, shell out, spawn
   runtimes, touch consumer FS, write to R2, mutate consumer state,
   call upstream LLM APIs, or read `.agent-src.uncompressed/`.
-- **Owner code path:** `workers/mcp/` (TypeScript, Cloudflare Worker).
+- **Owner code path:** `internal/workers/mcp/` (TypeScript, Cloudflare Worker).
   This contract is the normative spec.
 - **Auth model:** `public` (default) or `bearer-auth` (operator opt-in)
   per `## Auth surface`. HMAC and CF Access are declared but deferred.
@@ -194,7 +194,7 @@ runtime mode switch.
   returns HTTP `401` with a JSON-RPC error envelope (code `-32001`,
   message `"Unauthorized"`) and the RFC 6750
   `WWW-Authenticate: Bearer realm="agent-config-mcp"` header.
-  Implementation: `workers/mcp/src/index.ts` § auth gate (the
+  Implementation: `internal/workers/mcp/src/index.ts` § auth gate (the
   `if (requiredToken) { … }` block).
 - **Liveness carve-out:** the `GET /` liveness probe is
   unauthenticated by design — health checks and `curl` smoke tests

package/docs/contracts/mcp-registry-manifest.schema.json CHANGED Viewed

@@ -52,7 +52,7 @@
                 "tools_count": {
                     "type": "integer",
                     "minimum": 0,
-                    "description": "Sourced from workers/mcp/content.json#/tool_catalog/tools. Hard-sourced — no fallback. R3 (discovery roadmap) is a hard prerequisite per the AI-Council external review."
+                    "description": "Sourced from internal/workers/mcp/content.json#/tool_catalog/tools. Hard-sourced — no fallback. R3 (discovery roadmap) is a hard prerequisite per the AI-Council external review."
                 },
                 "install_hint_stdio": { "type": "string", "minLength": 1 }
             }

package/docs/contracts/mcp-tool-inventory.md CHANGED Viewed

@@ -47,7 +47,7 @@ keep-beta-until: 2026-08-14
 ## Glossary
 - **Side-effect** — `ro` (read-only) · `fs-write` (filesystem write) · `shell` (spawns processes).
-- **Transports** — `stdio` (`scripts/mcp_server/`) · `worker` (`workers/mcp/`). A tool may live on both.
+- **Transports** — `stdio` (`scripts/mcp_server/`) · `worker` (`internal/workers/mcp/`). A tool may live on both.
 - **Stub** — catalog-listed for discovery; returns the `not_implemented` envelope from
   [`mcp-tool-stub-envelope.md`](mcp-tool-stub-envelope.md) until promoted.

package/docs/contracts/mcp-tool-stub-envelope.md CHANGED Viewed

@@ -20,7 +20,7 @@ a 500.
 ## Source of truth
 `scripts/mcp_server/consumer_tool_catalog.json` (schema_version 1).
-Both the stdio server and the Cloud Worker bundle (`workers/mcp/`,
+Both the stdio server and the Cloud Worker bundle (`internal/workers/mcp/`,
 packed by `scripts/pack_mcp_content.py`) read from this file. The
 manifest returned by `tools/list` is byte-identical apart from
 per-tool `implemented_on` metadata.

package/docs/contracts/measurement-baseline.md CHANGED Viewed

@@ -24,7 +24,7 @@ Four axes, all numeric, all reproducible from the same input:
 Schemas: [`benchmark-report-schema.md`](benchmark-report-schema.md) ·
 [`benchmark-corpus-spec.md`](benchmark-corpus-spec.md). Reports land at
-`bench/reports/<utc-stamp>-<corpus>[-projection].{json,md}` —
+`internal/bench/reports/<utc-stamp>-<corpus>[-projection].{json,md}` —
 timestamped, never overwritten, content-addressed by run.
 ## Corpora — frozen for the soak window
@@ -67,10 +67,10 @@ NO ANECDOTE, NO INDIVIDUAL REPORT, NO ROADMAP-SIDE OVERRIDE.
 [`scripts/bench_baseline_ready.py`](../../scripts/bench_baseline_ready.py)
 returns exit 0 iff both:
-1. **Wall-clock soak:** `today − bench/baseline-start.txt ≥ --min-days` (default 60)
-2. **Report density:** `bench/reports/*-<corpus>.json` count ≥ `--min-reports` (default 30)
+1. **Wall-clock soak:** `today − internal/bench/baseline-start.txt ≥ --min-days` (default 60)
+2. **Report density:** `internal/bench/reports/*-<corpus>.json` count ≥ `--min-reports` (default 30)
-Soak start anchored at [`bench/baseline-start.txt`](../../bench/baseline-start.txt)
+Soak start anchored at [`internal/bench/baseline-start.txt`](../../bench/baseline-start.txt)
 = **2026-05-16**. Earliest possible flip: **2026-07-15**, contingent
 on the 30-report floor.
@@ -86,11 +86,11 @@ On baseline closure, the step-4 closeout writes the numeric verdict to
 [`docs/parity/bench.json`](../parity/bench.json) — frozen snapshot with
 the 30+ reports averaged, drift verdict, and the compression-default
 decision per the kill-criterion table. That file is the artefact every
-P2 roadmap reads — not the live `bench/reports/` directory.
+P2 roadmap reads — not the live `internal/bench/reports/` directory.
 ## Carve-outs
-- **Pricing freshness:** [`bench/pricing.yaml`](../../bench/pricing.yaml) rows must carry `sourced_on: YYYY-MM-DD`. Stale prices = stale numbers = no trust (ruflo "measured-vs-claimed" pattern).
+- **Pricing freshness:** [`internal/bench/pricing.yaml`](../../bench/pricing.yaml) rows must carry `sourced_on: YYYY-MM-DD`. Stale prices = stale numbers = no trust (ruflo "measured-vs-claimed" pattern).
 - **Subjective grading excluded:** quality scoring is mechanical via `quality_assertion`. No vibes.
 - **Cursor / Cline / Windsurf:** rules-only surfaces, no SKILL.md projection. `bench:projection` reports them as `not_applicable` — the gap is acknowledged, not silently dropped.

package/docs/decisions/ADR-027-changelog-machine-vs-manual.md ADDED Viewed

@@ -0,0 +1,129 @@
+---
+adr: 027
+status: accepted
+date: 2026-05-25
+decision: changelog-machine-vs-manual
+supersedes: —
+superseded_by: —
+phase: v3.x · changelog-era-auto-split Phase 4
+type: discovery-loop-closure
+review_date: 2027-05-25
+---
+# ADR-027 — CHANGELOG convention — confirm manual narrative + auto-split for 12 months
+## Status
+**Accepted** · 2026-05-25. Closes Phase 4 of
+[`road-to-changelog-era-auto-split.md`](../../agents/roadmaps/archive/road-to-changelog-era-auto-split.md).
+Time-boxed: review on 2027-05-25 or earlier if a trigger below fires.
+## Context
+The AI Council that produced the auto-split design surfaced one upstream
+question: **is the manual changelog convention itself the right primitive
+for a package that bills itself as an "Universal AI Agent OS"?** Phase 4
+investigates without committing to a rewrite — either confirm
+`docs/contracts/CHANGELOG-conventions.md` + the Tier 2/3 machinery stand
+for 12 months, or draft a successor roadmap.
+### Step 1 — Consumer touchpoint audit
+Every place a consumer reads the changelog and which fields they read:
+| # | Surface | What the consumer sees | Field weight |
+|---|---|---|---|
+| 1 | **GitHub Release notes** | `plan.changelog_body` rendered via `scripts/release.py:818` | narrative + bullets + tests-delta |
+| 2 | **npmjs.com package page** | auto-rendered `CHANGELOG.md` (top of file) | narrative paragraph dominates the fold |
+| 3 | **packagist.org package page** | same — auto-rendered `CHANGELOG.md` | same |
+| 4 | **`README.md` footer** | link only to `CHANGELOG.md` and `releases/latest` | navigation, no content |
+| 5 | **`CHANGELOG.md` direct** | full structured entries | full Keep-a-Changelog shape |
+| 6 | **`docs/archive/CHANGELOG-pre-*.md`** | historical eras behind the active-era pointer | follow-up reads only |
+| 7 | **`agents/settings/contexts/adr-artifact-engagement.md` § L100** | guidance to write a deprecation note in `CHANGELOG.md` | governance / authoring, not consumer |
+`docs/getting-started-by-role.md` — not in scope. No role-pack `FIRST_WIN.md` references the changelog.
+**Reader insight.** Surfaces 1–3 are the high-traffic consumer surfaces.
+Each renders the *top* of the changelog: the narrative paragraph + the
+first bullet group + the tests-delta. The narrative paragraph carries
+the framing every other layer (bullets, compare-link) loses.
+### Step 2 — Spike comparison against the audit
+Three shapes, measured against the 3.2.0 release (38 commits, 141 lines
+of narrative + bullets in the current convention):
+| Metric | Current (manual narrative + auto-split) | (a) `release-please` fully generated | (b) Hybrid (manual paragraph + generated bullets) |
+|---|---:|---:|---:|
+| **Token budget per release** | ~2.8k (141 lines × ~80 chars) | ~1.1k (38 commits × ~120 chars) | ~2.6k (same as current; bullets generated, paragraph hand-written) |
+| **Maintainer minutes per release** | 5–15 (write narrative; auto-split runs in `task release`) | 0 direct + ~30/month policing commit messages | 5–10 (paragraph only) |
+| **Information density per line** | high — narrative compresses 5–10 bullets of context | low — every infrastructure commit becomes a line | high — same as current |
+| **Parseability for downstream agents** | high — `### Features` / `### Fixes` / `### BREAKING` are semantic anchors | medium — same headings, but no narrative anchor to disambiguate scope churn | high |
+| **Hands-off failure mode** | era over-cap → auto-split fires | commit-message drift pollutes the log → no recovery without rewriting commits | narrative drift → degrades to release-please equivalent |
+The fully-generated shape **loses the narrative paragraph**, which is
+the field weight #1–3 surfaces lean on. The hybrid shape is what the
+current convention *already* allows — `release-please` is just the
+extreme end of the auto-spectrum; the convention sits at a defensible
+middle.
+The operational cost that prompted the question (era over-cap blocking
+the release PR) is solved by Tier 2 in the same roadmap. The discovery
+question therefore reduces to: **is the narrative paragraph worth
+~2.5k tokens per release?** Audit says yes — it is the field consumers
+read first and the field that differentiates the package from bot
+output.
+## Decision
+**Confirm `docs/contracts/CHANGELOG-conventions.md` + Tier 2/3
+machinery for 12 months.** No successor roadmap.
+The current convention:
+- Keeps the **hand-written narrative paragraph** as the load-bearing
+  framing for consumer surfaces (npm, packagist, GitHub Releases).
+- Lets `scripts/release.py` generate the **bullet list** under
+  Features / Fixes / Chores / Docs / BREAKING via the auto-split flow
+  plus the existing Keep-a-Changelog shape.
+- Lets the **drift gate** (`tests/test_changelog_eras.py`) catch
+  non-release edits that grow the active era past 250 lines.
+The "Universal AI Agent OS" framing **argues for**, not against, a
+human-curated changelog — agents that consume `CHANGELOG.md` as a
+ground-truth of "what changed" prefer structured, semantic, framed
+entries over a flat commit dump.
+## Consequences
+- `docs/contracts/CHANGELOG-conventions.md` stands as-is. The
+  Tier 3 Step 3 "Gate-vs-script contract" subsection is the canonical
+  reference for the gate / script split.
+- No new tooling lands as a Phase 4 follow-up.
+- Review on **2027-05-25** or earlier if any trigger fires:
+  1. The narrative paragraph stops being written for two consecutive
+     releases (signal: convention is breaking down naturally).
+  2. The active-era body grows past 250 lines from non-release edits
+     more than twice in a quarter (signal: humans are bypassing the
+     gate; auto-split is no longer the right primitive).
+  3. A downstream consumer (npm/packagist/GitHub Release renderer)
+     changes how it slices the top-of-file (signal: the field weight
+     assumption above is invalidated).
+  4. Council session re-opens the question with new evidence.
+## Alternatives considered
+| Option | Why rejected |
+|---|---|
+| **`release-please` fully generated** | Loses the narrative paragraph; surfaces 1–3 lose their highest-weight field. |
+| **Hybrid (manual paragraph + generated bullets, separate tool)** | Indistinguishable from current convention in output; adds a tool boundary without changing the artefact. |
+| **Drop the changelog entirely, point to GitHub Releases** | npm / packagist auto-render `CHANGELOG.md` from the repo; deleting it degrades surfaces 2–3 to a generic placeholder. |
+## References
+- [`docs/contracts/CHANGELOG-conventions.md`](../contracts/CHANGELOG-conventions.md) — convention being confirmed.
+- [`agents/roadmaps/archive/road-to-changelog-era-auto-split.md`](../../agents/roadmaps/archive/road-to-changelog-era-auto-split.md) — closes Phase 4.
+- [`agents/runtime/council/responses/changelog-era-split-2026-05-25.json`](../../agents/runtime/council/responses/changelog-era-split-2026-05-25.json) — the originating council synthesis.
+- `scripts/_lib/changelog_eras.py` — shared cap + splitter (Tier 2 output).
+- `scripts/release.py:818` — `plan.changelog_body` → GitHub Release notes wire.
+- `tests/test_changelog_eras.py`, `tests/test_changelog_split.py` — gate + splitter coverage.