@event4u/agent-config 2.19.0 → 2.20.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent-src/commands/agent-status.md +29 -0
- package/.agent-src/commands/onboard.md +221 -81
- package/.agent-src/packs/README.md +49 -0
- package/.agent-src/packs/agency-delivery.yml +63 -0
- package/.agent-src/packs/content-engine.yml +53 -0
- package/.agent-src/packs/founder-mvp.yml +51 -0
- package/.agent-src/presets/README.md +26 -0
- package/.agent-src/presets/balanced.yml +34 -0
- package/.agent-src/presets/fast.yml +31 -0
- package/.agent-src/presets/strict.yml +38 -0
- package/.agent-src/profiles/README.md +29 -0
- package/.agent-src/profiles/agency.yml +27 -0
- package/.agent-src/profiles/content_creator.yml +25 -0
- package/.agent-src/profiles/developer.yml +26 -0
- package/.agent-src/profiles/finance.yml +24 -0
- package/.agent-src/profiles/founder.yml +25 -0
- package/.agent-src/profiles/ops.yml +25 -0
- package/.agent-src/rules/no-cheap-questions.md +25 -17
- package/.agent-src/skills/adr-create/SKILL.md +78 -68
- package/.agent-src/skills/subagent-orchestration/SKILL.md +33 -0
- package/.agent-src/templates/agents/agent-project-settings.example.yml +1 -1
- package/.agent-src/templates/skill-archive-note.md +101 -0
- package/.claude-plugin/marketplace.json +1 -1
- package/CHANGELOG.md +52 -30
- package/README.md +68 -72
- package/config/agent-settings.template.yml +22 -0
- package/docs/adrs/caveman/0001-default-off-until-bench.md +93 -0
- package/docs/adrs/caveman/README.md +9 -0
- package/docs/adrs/cost/0001-hard-stop-hook.md +114 -0
- package/docs/adrs/cost/README.md +9 -0
- package/docs/adrs/memory/0001-consumer-side-snapshot.md +111 -0
- package/docs/adrs/memory/README.md +9 -0
- package/docs/adrs/router/0001-three-tier-routing.md +119 -0
- package/docs/adrs/router/README.md +9 -0
- package/docs/adrs/schema/0001-json-schema-frontmatter.md +102 -0
- package/docs/adrs/schema/README.md +9 -0
- package/docs/adrs/smoke/0001-per-tier-smoke-scripts.md +99 -0
- package/docs/adrs/smoke/README.md +9 -0
- package/docs/architecture/current-onboard-baseline.md +126 -0
- package/docs/architecture/current-safety-behavior.md +137 -0
- package/docs/archive/CHANGELOG-pre-2.16.0.md +48 -0
- package/docs/contracts/adr-layout.md +108 -0
- package/docs/contracts/benchmark-corpus-spec.md +97 -0
- package/docs/contracts/benchmark-report-schema.md +111 -0
- package/docs/contracts/command-clusters.md +1 -0
- package/docs/contracts/command-taxonomy.md +137 -0
- package/docs/contracts/compression-default-kill-criterion.md +69 -0
- package/docs/contracts/config-presets.md +144 -0
- package/docs/contracts/cost-dashboard.md +143 -0
- package/docs/contracts/cost-enforcement.md +134 -0
- package/docs/contracts/file-ownership-matrix.json +0 -7
- package/docs/contracts/mcp-tool-inventory.md +53 -0
- package/docs/contracts/measurement-baseline.md +102 -0
- package/docs/contracts/namespace.md +125 -0
- package/docs/contracts/profile-system.md +142 -0
- package/docs/contracts/safety-model.md +129 -0
- package/docs/contracts/smoke-contracts.md +144 -0
- package/docs/contracts/workflow-packs.md +121 -0
- package/docs/decisions/ADR-010-profile-pack-preset-boundary.md +132 -0
- package/docs/decisions/INDEX.md +1 -0
- package/docs/featured-commands.md +27 -0
- package/docs/parity/bench-ruflo.json +58 -0
- package/docs/parity/bench.json +41 -0
- package/docs/parity/ruflo.md +46 -0
- package/docs/profiles.md +91 -0
- package/package.json +1 -1
- package/scripts/_cli/cmd_explain.py +250 -0
- package/scripts/_lib/bench_cost.py +138 -0
- package/scripts/_lib/bench_quality.py +118 -0
- package/scripts/_lib/bench_report.py +150 -0
- package/scripts/agent-config +13 -0
- package/scripts/audit_adr_coverage.py +175 -0
- package/scripts/audit_mcp_tools.py +146 -0
- package/scripts/bench_baseline_ready.py +108 -0
- package/scripts/bench_drift_check.py +151 -0
- package/scripts/bench_per_tool.py +216 -0
- package/scripts/bench_run.py +155 -0
- package/scripts/config/__init__.py +9 -0
- package/scripts/config/presets.py +206 -0
- package/scripts/config/profiles.py +173 -0
- package/scripts/cost/budget.mjs +73 -12
- package/scripts/cost/preflight.mjs +89 -0
- package/scripts/lint_archived_skills.py +143 -0
- package/scripts/lint_bench_corpus.py +161 -0
- package/scripts/lint_namespace.py +135 -0
- package/scripts/skill_overlap.py +204 -0
- package/scripts/skill_usage_collect.py +191 -0
- package/scripts/skill_usage_report.py +162 -0
- package/scripts/smoke/kernel.sh +101 -0
- package/scripts/smoke/router.sh +129 -0
- package/scripts/smoke/schema.sh +71 -0
- package/scripts/smoke/skills.sh +101 -0
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: beta
|
|
3
|
+
keep-beta-until: 2026-08-14
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# MCP tool inventory
|
|
7
|
+
|
|
8
|
+
> Generated by [`scripts/audit_mcp_tools.py`](../../scripts/audit_mcp_tools.py)
|
|
9
|
+
> from the source-of-truth catalog
|
|
10
|
+
> [`scripts/mcp_server/consumer_tool_catalog.json`](../../scripts/mcp_server/consumer_tool_catalog.json).
|
|
11
|
+
> Do **not** hand-edit; rerun `python3 scripts/audit_mcp_tools.py --write`.
|
|
12
|
+
>
|
|
13
|
+
> Step-11 Phase 5 Step 3 (`step-11-ruflo-parity.md`).
|
|
14
|
+
|
|
15
|
+
## Summary
|
|
16
|
+
|
|
17
|
+
- **Total tools:** 20
|
|
18
|
+
- **By transport:** stdio=9
|
|
19
|
+
- **By side-effect:** fs-write=5, ro=12, shell=3
|
|
20
|
+
- **Discovery-only stubs (no implementation):** 11
|
|
21
|
+
|
|
22
|
+
## Tools
|
|
23
|
+
|
|
24
|
+
| Tool | Side-effect | Transports | Catalog | Handler |
|
|
25
|
+
|---|---|---|---|---|
|
|
26
|
+
| `lint_skills` | `ro` | stdio | [`consumer_tool_catalog.json:7`](../../scripts/mcp_server/consumer_tool_catalog.json#L7) | [`tools.py:510`](../../scripts/mcp_server/tools.py#L510) |
|
|
27
|
+
| `chat_history_append` | `fs-write` | stdio | [`consumer_tool_catalog.json:24`](../../scripts/mcp_server/consumer_tool_catalog.json#L24) | [`tools.py:535`](../../scripts/mcp_server/tools.py#L535) |
|
|
28
|
+
| `chat_history_read` | `ro` | stdio | [`consumer_tool_catalog.json:43`](../../scripts/mcp_server/consumer_tool_catalog.json#L43) | [`tools.py:571`](../../scripts/mcp_server/tools.py#L571) |
|
|
29
|
+
| `memory_lookup` | `ro` | stdio | [`consumer_tool_catalog.json:59`](../../scripts/mcp_server/consumer_tool_catalog.json#L59) | [`tools.py:590`](../../scripts/mcp_server/tools.py#L590) |
|
|
30
|
+
| `memory_signal` | `fs-write` | _(stub)_ | [`consumer_tool_catalog.json:75`](../../scripts/mcp_server/consumer_tool_catalog.json#L75) | _stub-only_ |
|
|
31
|
+
| `memory_status` | `ro` | stdio | [`consumer_tool_catalog.json:91`](../../scripts/mcp_server/consumer_tool_catalog.json#L91) | [`tools.py:617`](../../scripts/mcp_server/tools.py#L617) |
|
|
32
|
+
| `skill_trigger_eval` | `ro` | _(stub)_ | [`consumer_tool_catalog.json:98`](../../scripts/mcp_server/consumer_tool_catalog.json#L98) | _stub-only_ |
|
|
33
|
+
| `suggest_command` | `ro` | _(stub)_ | [`consumer_tool_catalog.json:114`](../../scripts/mcp_server/consumer_tool_catalog.json#L114) | _stub-only_ |
|
|
34
|
+
| `suggest_skill_for_task` | `ro` | _(stub)_ | [`consumer_tool_catalog.json:129`](../../scripts/mcp_server/consumer_tool_catalog.json#L129) | _stub-only_ |
|
|
35
|
+
| `mine_session` | `ro` | _(stub)_ | [`consumer_tool_catalog.json:144`](../../scripts/mcp_server/consumer_tool_catalog.json#L144) | _stub-only_ |
|
|
36
|
+
| `update_form_request_messages` | `fs-write` | _(stub)_ | [`consumer_tool_catalog.json:158`](../../scripts/mcp_server/consumer_tool_catalog.json#L158) | _stub-only_ |
|
|
37
|
+
| `sync_gitignore` | `fs-write` | _(stub)_ | [`consumer_tool_catalog.json:173`](../../scripts/mcp_server/consumer_tool_catalog.json#L173) | _stub-only_ |
|
|
38
|
+
| `sync_agent_settings` | `fs-write` | _(stub)_ | [`consumer_tool_catalog.json:186`](../../scripts/mcp_server/consumer_tool_catalog.json#L186) | _stub-only_ |
|
|
39
|
+
| `run_tests` | `shell` | _(stub)_ | [`consumer_tool_catalog.json:200`](../../scripts/mcp_server/consumer_tool_catalog.json#L200) | _stub-only_ |
|
|
40
|
+
| `run_quality_checks` | `shell` | _(stub)_ | [`consumer_tool_catalog.json:214`](../../scripts/mcp_server/consumer_tool_catalog.json#L214) | _stub-only_ |
|
|
41
|
+
| `list_skills` | `ro` | stdio | [`consumer_tool_catalog.json:227`](../../scripts/mcp_server/consumer_tool_catalog.json#L227) | [`tools.py:631`](../../scripts/mcp_server/tools.py#L631) |
|
|
42
|
+
| `list_commands` | `ro` | stdio | [`consumer_tool_catalog.json:234`](../../scripts/mcp_server/consumer_tool_catalog.json#L234) | [`tools.py:644`](../../scripts/mcp_server/tools.py#L644) |
|
|
43
|
+
| `list_rules` | `ro` | stdio | [`consumer_tool_catalog.json:241`](../../scripts/mcp_server/consumer_tool_catalog.json#L241) | [`tools.py:657`](../../scripts/mcp_server/tools.py#L657) |
|
|
44
|
+
| `compile_router` | `shell` | _(stub)_ | [`consumer_tool_catalog.json:248`](../../scripts/mcp_server/consumer_tool_catalog.json#L248) | _stub-only_ |
|
|
45
|
+
| `read_resource_body` | `ro` | stdio | [`consumer_tool_catalog.json:261`](../../scripts/mcp_server/consumer_tool_catalog.json#L261) | [`tools.py:670`](../../scripts/mcp_server/tools.py#L670) |
|
|
46
|
+
|
|
47
|
+
## Glossary
|
|
48
|
+
|
|
49
|
+
- **Side-effect** — `ro` (read-only) · `fs-write` (filesystem write) · `shell` (spawns processes).
|
|
50
|
+
- **Transports** — `stdio` (`scripts/mcp_server/`) · `worker` (`workers/mcp/`). A tool may live on both.
|
|
51
|
+
- **Stub** — catalog-listed for discovery; returns the `not_implemented` envelope from
|
|
52
|
+
[`mcp-tool-stub-envelope.md`](mcp-tool-stub-envelope.md) until promoted.
|
|
53
|
+
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: stable
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Measurement baseline — contract
|
|
6
|
+
|
|
7
|
+
> **Status:** locked 2026-05-16 · **Owner:** `step-4-measurement-and-benchmark.md`
|
|
8
|
+
> · **Cited by:** every P2 enforcement roadmap (skill rationalization G0, north-star G1, compression default decision).
|
|
9
|
+
|
|
10
|
+
Single source of truth for what `task bench` measures, what counts as
|
|
11
|
+
drift, and what unblocks enforcement. Read this before pinning a number
|
|
12
|
+
to a roadmap or PR description.
|
|
13
|
+
|
|
14
|
+
## What `task bench` measures
|
|
15
|
+
|
|
16
|
+
Four axes, all numeric, all reproducible from the same input:
|
|
17
|
+
|
|
18
|
+
| Axis | Source | Definition | Units |
|
|
19
|
+
|---|---|---|---:|
|
|
20
|
+
| **selection accuracy** | [`scripts/bench_runner.py`](../../scripts/bench_runner.py) | Keyword-overlap ranker hits the expected skill in top-K | % |
|
|
21
|
+
| **cost** | [`scripts/cost/track.mjs`](../../scripts/cost/track.mjs) session jsonl | Token+USD per model, captured live | USD |
|
|
22
|
+
| **quality** | regex / rubric assertions per prompt | `quality_assertion` matches in agent output | % |
|
|
23
|
+
| **projection fidelity** | [`scripts/bench_per_tool.py`](../../scripts/bench_per_tool.py) | `accuracy(tool) / accuracy(augment)` for skill-projecting tools | ratio |
|
|
24
|
+
|
|
25
|
+
Schemas: [`benchmark-report-schema.md`](benchmark-report-schema.md) ·
|
|
26
|
+
[`benchmark-corpus-spec.md`](benchmark-corpus-spec.md). Reports land at
|
|
27
|
+
`bench/reports/<utc-stamp>-<corpus>[-projection].{json,md}` —
|
|
28
|
+
timestamped, never overwritten, content-addressed by run.
|
|
29
|
+
|
|
30
|
+
## Corpora — frozen for the soak window
|
|
31
|
+
|
|
32
|
+
| Corpus | Path | Prompts | Purpose |
|
|
33
|
+
|---|---|---:|---|
|
|
34
|
+
| `dev` | [`tests/eval/corpus-dev.yaml`](../../tests/eval/corpus-dev.yaml) | 10 | Developer task surface (Laravel/Symfony/React/CI/PR) |
|
|
35
|
+
| `non-dev` | [`tests/eval/corpus-non-dev.yaml`](../../tests/eval/corpus-non-dev.yaml) | 16 | Founder / agency / content creator surface (Wing-4) |
|
|
36
|
+
|
|
37
|
+
Total 26 prompts ≥ Acceptance Criteria floor of 25. Mid-window edits
|
|
38
|
+
to either YAML restart the 60-day clock per
|
|
39
|
+
[`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) § 2.
|
|
40
|
+
|
|
41
|
+
## What counts as drift
|
|
42
|
+
|
|
43
|
+
[`scripts/bench_drift_check.py`](../../scripts/bench_drift_check.py)
|
|
44
|
+
compares the latest report against a sliding window of the prior N runs
|
|
45
|
+
(default 5) for the same corpus.
|
|
46
|
+
|
|
47
|
+
| Axis | Threshold | Note |
|
|
48
|
+
|---|---|---|
|
|
49
|
+
| selection accuracy | latest − baseline_mean ≤ −5 pp | always evaluated |
|
|
50
|
+
| cost | latest / baseline_mean ≥ +20 % | only when both sides have `source: captured` |
|
|
51
|
+
| quality | latest − baseline_mean ≤ −10 pp | skipped when latest is `not_collected` |
|
|
52
|
+
| projection fidelity | tool fidelity < 0.85 | exit 1 from `task bench:projection` |
|
|
53
|
+
|
|
54
|
+
Drift exits with code 2 from `task bench:drift`. **CI posture during
|
|
55
|
+
soak:** all bench-drift steps `continue-on-error: true` and post a
|
|
56
|
+
sticky PR comment — informational only, not a merge gate. Flip to
|
|
57
|
+
required check happens via a separate PR once
|
|
58
|
+
`task bench:baseline-ready` returns 0 (see below).
|
|
59
|
+
|
|
60
|
+
## What unblocks enforcement (the G1 gate)
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
TASK bench:baseline-ready EXIT 0 IS THE ONLY AUTHORITY.
|
|
64
|
+
NO ANECDOTE, NO INDIVIDUAL REPORT, NO ROADMAP-SIDE OVERRIDE.
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
[`scripts/bench_baseline_ready.py`](../../scripts/bench_baseline_ready.py)
|
|
68
|
+
returns exit 0 iff both:
|
|
69
|
+
|
|
70
|
+
1. **Wall-clock soak:** `today − bench/baseline-start.txt ≥ --min-days` (default 60)
|
|
71
|
+
2. **Report density:** `bench/reports/*-<corpus>.json` count ≥ `--min-reports` (default 30)
|
|
72
|
+
|
|
73
|
+
Soak start anchored at [`bench/baseline-start.txt`](../../bench/baseline-start.txt)
|
|
74
|
+
= **2026-05-16**. Earliest possible flip: **2026-07-15**, contingent
|
|
75
|
+
on the 30-report floor.
|
|
76
|
+
|
|
77
|
+
Downstream consumers:
|
|
78
|
+
|
|
79
|
+
- ``step-99-north-star-restructure.md` § Acceptance G1` — reads this exit code.
|
|
80
|
+
- [`compression-default-kill-criterion.md` § 3](compression-default-kill-criterion.md) — reads the decision table after baseline closes.
|
|
81
|
+
- ``step-2-skill-inventory-rationalization.md` § G0` — usage-data soak floor.
|
|
82
|
+
|
|
83
|
+
## What the closeout writes
|
|
84
|
+
|
|
85
|
+
On baseline closure, the step-4 closeout writes the numeric verdict to
|
|
86
|
+
[`docs/parity/bench.json`](../parity/bench.json) — frozen snapshot with
|
|
87
|
+
the 30+ reports averaged, drift verdict, and the compression-default
|
|
88
|
+
decision per the kill-criterion table. That file is the artefact every
|
|
89
|
+
P2 roadmap reads — not the live `bench/reports/` directory.
|
|
90
|
+
|
|
91
|
+
## Carve-outs
|
|
92
|
+
|
|
93
|
+
- **Pricing freshness:** [`bench/pricing.yaml`](../../bench/pricing.yaml) rows must carry `sourced_on: YYYY-MM-DD`. Stale prices = stale numbers = no trust (ruflo "measured-vs-claimed" pattern).
|
|
94
|
+
- **Subjective grading excluded:** quality scoring is mechanical via `quality_assertion`. No vibes.
|
|
95
|
+
- **Cursor / Cline / Windsurf:** rules-only surfaces, no SKILL.md projection. `bench:projection` reports them as `not_applicable` — the gap is acknowledged, not silently dropped.
|
|
96
|
+
|
|
97
|
+
## Cross-references
|
|
98
|
+
|
|
99
|
+
- [`benchmark-report-schema.md`](benchmark-report-schema.md) · per-report JSON schema
|
|
100
|
+
- [`benchmark-corpus-spec.md`](benchmark-corpus-spec.md) · corpus YAML schema
|
|
101
|
+
- [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) · decision table read by step-4 closeout
|
|
102
|
+
- `step-4-measurement-and-benchmark.md` · the owning roadmap
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: stable
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Namespace contract — skills, rules, commands, personas
|
|
6
|
+
|
|
7
|
+
> Every artefact name is a **stable identifier**: routed to from
|
|
8
|
+
> `router.json`, cited from skills, surfaced in `/help`, embedded in
|
|
9
|
+
> command paths, and back-referenced in test fixtures. Drift breaks
|
|
10
|
+
> all five surfaces silently.
|
|
11
|
+
>
|
|
12
|
+
> **Source:** Step-11 Phase 5 Step 1
|
|
13
|
+
> (`step-11-ruflo-parity.md`).
|
|
14
|
+
> **Enforcer:** [`scripts/lint_namespace.py`](../../scripts/lint_namespace.py),
|
|
15
|
+
> wired into `task lint-skills`.
|
|
16
|
+
|
|
17
|
+
## 1. Shape
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
<stem>-<intent> kebab-case, ASCII, lowercase
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
| Component | Rule |
|
|
24
|
+
|---|---|
|
|
25
|
+
| Charset | `[a-z0-9-]+` only |
|
|
26
|
+
| Separator | single `-` between tokens; never `_`, `.`, or camelCase |
|
|
27
|
+
| Length | skills: 3 ≤ name ≤ 64 · rules / commands / personas: 2 ≤ name ≤ 64 (two-letter slot reserved for intentional acronyms — `pr`, `ci`, `qa`, `me`) |
|
|
28
|
+
| First char | `[a-z]` (digits and `-` forbidden at start) |
|
|
29
|
+
| Last char | `[a-z0-9]` (trailing `-` forbidden) |
|
|
30
|
+
| Run | no consecutive `--` |
|
|
31
|
+
|
|
32
|
+
The `<stem>` carries the **subject** (`commit`, `eloquent`,
|
|
33
|
+
`livewire`); the `<intent>` (optional) carries the **verb / lens**
|
|
34
|
+
(`-writing`, `-architect`, `-routing`). Single-token names are
|
|
35
|
+
permitted when the stem already encodes both (`commit`, `eloquent`,
|
|
36
|
+
`docker`).
|
|
37
|
+
|
|
38
|
+
## 2. Reserved names — forbidden as artefact names
|
|
39
|
+
|
|
40
|
+
| Name | Reason |
|
|
41
|
+
|---|---|
|
|
42
|
+
| `pattern` | Reserved for trigger-pattern fixtures (see `tests/fixtures/triggers/`). |
|
|
43
|
+
| `claude-memories` | Reserved for the `~/.claude/CLAUDE.md` shape — host-agent state, not a package artefact. |
|
|
44
|
+
| `default` | Ambiguous with profile / mode defaults; collides with `.agent-settings.yml` keys. |
|
|
45
|
+
| `index` | Reserved for auto-generated INDEX.md files. |
|
|
46
|
+
| `router` | Reserved for `router.json` and the router contract. |
|
|
47
|
+
|
|
48
|
+
Reserved names apply at the **top level** of each artefact type. A
|
|
49
|
+
sub-verb under a namespaced group (e.g. `council/default.md` →
|
|
50
|
+
`/council:default`) is **not** a top-level identifier — the group
|
|
51
|
+
prefix disambiguates it, and reserved-name enforcement is skipped
|
|
52
|
+
for sub-verbs by the linter. A future artefact `pattern-foo` at the
|
|
53
|
+
top level is fine; bare `pattern` is not.
|
|
54
|
+
|
|
55
|
+
`README.md` and `INDEX.md` are documentation, not artefacts, and are
|
|
56
|
+
skipped by the linter.
|
|
57
|
+
|
|
58
|
+
## 3. Per-type conventions
|
|
59
|
+
|
|
60
|
+
| Type | Source path | Naming nuance |
|
|
61
|
+
|---|---|---|
|
|
62
|
+
| Skill | `.agent-src.uncompressed/skills/<name>/SKILL.md` | Directory name == frontmatter `name`. |
|
|
63
|
+
| Rule | `.agent-src.uncompressed/rules/<name>.md` | Filename stem == frontmatter `id` (when present). |
|
|
64
|
+
| Command | `.agent-src.uncompressed/commands/<name>.md` or `<group>/<verb>.md` | Slash-command invocation `<name>` or `<group>:<verb>`. |
|
|
65
|
+
| Persona | `.agent-src.uncompressed/personas/<name>.md` | Cited from skill frontmatter `personas:` list. |
|
|
66
|
+
|
|
67
|
+
Sub-namespacing (`commit/in-chunks.md` → `/commit:in-chunks`) uses
|
|
68
|
+
the same charset rules per segment; the joining colon is implicit.
|
|
69
|
+
|
|
70
|
+
## 4. Linter — `scripts/lint_namespace.py`
|
|
71
|
+
|
|
72
|
+
Walks the four source roots above, asserts each artefact name:
|
|
73
|
+
|
|
74
|
+
1. Matches the regex `^[a-z][a-z0-9]*(-[a-z0-9]+)*$`.
|
|
75
|
+
2. Length 3 ≤ name ≤ 64.
|
|
76
|
+
3. Not in the reserved-names list.
|
|
77
|
+
4. Skill: directory name matches frontmatter `name`.
|
|
78
|
+
|
|
79
|
+
Exit codes:
|
|
80
|
+
|
|
81
|
+
| Exit | Meaning |
|
|
82
|
+
|---|---|
|
|
83
|
+
| `0` | All names valid. |
|
|
84
|
+
| `1` | At least one name fails a rule. |
|
|
85
|
+
| `2` | Linter crashed (filesystem error, malformed frontmatter). |
|
|
86
|
+
|
|
87
|
+
Diagnostic format: one issue per line — `<path>: <rule> — <detail>`.
|
|
88
|
+
|
|
89
|
+
## 5. Adding a new artefact
|
|
90
|
+
|
|
91
|
+
Pick the name; verify locally:
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
python3 scripts/lint_namespace.py --name <candidate>
|
|
95
|
+
# or full run:
|
|
96
|
+
python3 scripts/lint_namespace.py
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
If the candidate fails, the linter prints the rule it violated.
|
|
100
|
+
**Renames after release are expensive** — touch router.json, every
|
|
101
|
+
skill citing the old name, the bench corpus, and consumer settings.
|
|
102
|
+
Pay the naming cost once, upfront.
|
|
103
|
+
|
|
104
|
+
## 6. Relationship to the frontmatter contract
|
|
105
|
+
|
|
106
|
+
The **shape** lives here. The **frontmatter keys** that carry the
|
|
107
|
+
name (`name:` in skills, `id:` in rules) live in
|
|
108
|
+
[`frontmatter-contract.md`](../../agents/docs/frontmatter-contract.md).
|
|
109
|
+
Both contracts share the regex; this file is the source of truth for
|
|
110
|
+
the regex string.
|
|
111
|
+
|
|
112
|
+
## 7. Why this exists
|
|
113
|
+
|
|
114
|
+
`router.json` resolves `<kind>:<id>` strings at session start. Any
|
|
115
|
+
artefact rename breaks every routing entry pointing at the old name
|
|
116
|
+
without compile-time error. The linter catches the rename at the PR
|
|
117
|
+
boundary, not at runtime in a consumer.
|
|
118
|
+
|
|
119
|
+
## 8. Out of scope
|
|
120
|
+
|
|
121
|
+
- File-system case sensitivity (we rely on lowercase-only names).
|
|
122
|
+
- Cross-tool aliases (Augment / Claude / Cursor all consume the same
|
|
123
|
+
name — projection is by content, not by alias).
|
|
124
|
+
- Versioning suffixes (`-v2`, `-legacy`). Use `status: superseded`
|
|
125
|
+
in frontmatter instead; never rename in place.
|
|
@@ -0,0 +1,142 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: beta
|
|
3
|
+
keep-beta-until: 2026-08-14
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Profile System — Contract
|
|
7
|
+
|
|
8
|
+
> **Status:** beta · **Owner:** package maintainer · **Last reviewed:** 2026-05-16
|
|
9
|
+
>
|
|
10
|
+
> Schema and semantics for the **Profile** axis introduced in step-15
|
|
11
|
+
> Phase 1 item 1. Profile answers *who is the user?* — audience
|
|
12
|
+
> taxonomy that selects the default skill/command surface, README
|
|
13
|
+
> entry-paragraph, and persona pre-selection. Boundary against
|
|
14
|
+
> `preset.id`, `pack.id`, and `cost_profile`:
|
|
15
|
+
> [`ADR-010`](../decisions/ADR-010-profile-pack-preset-boundary.md).
|
|
16
|
+
|
|
17
|
+
## Decision
|
|
18
|
+
|
|
19
|
+
A **profile** declares the user's audience identity. Six seed profiles
|
|
20
|
+
ship; users can declare their own under
|
|
21
|
+
`.agent-src.uncompressed/profiles/<id>.yml`.
|
|
22
|
+
|
|
23
|
+
| `profile.id` | Audience | README entry-paragraph | Default `preset.id` |
|
|
24
|
+
|---|---|---|---|
|
|
25
|
+
| `founder` | Solo / early-stage founder; wears every hat | "Ship the company, not the codebase" | `fast` |
|
|
26
|
+
| `developer` | IC engineer; primary day-to-day user today | "Pair with a senior reviewer that never sleeps" | `balanced` |
|
|
27
|
+
| `content_creator` | Writers, ghostwriters, marketers | "Your voice, my hands" | `balanced` |
|
|
28
|
+
| `agency` | Multi-client delivery shop | "Same playbook across every client repo" | `strict` |
|
|
29
|
+
| `finance` | CFO / fractional finance / FP&A | "Forecasts and memos with the receipts attached" | `strict` |
|
|
30
|
+
| `ops` | RevOps, support, SRE-adjacent | "Procedures that get followed, not skipped" | `strict` |
|
|
31
|
+
|
|
32
|
+
The seed set is **fixed for v2.x**. Adding a seventh profile requires
|
|
33
|
+
an ADR — the contract surface that ships in the wizard
|
|
34
|
+
(`/onboard` role-selection) treats this set as exhaustive.
|
|
35
|
+
|
|
36
|
+
## Profile shape
|
|
37
|
+
|
|
38
|
+
```yaml
|
|
39
|
+
profile:
|
|
40
|
+
id: developer
|
|
41
|
+
audience:
|
|
42
|
+
label: "IC engineer"
|
|
43
|
+
readme_anchor: "developer" # selects README first-screen block
|
|
44
|
+
defaults:
|
|
45
|
+
preset_id: balanced # may be overridden by .agent-settings.yml
|
|
46
|
+
personas: [reviewer, security] # pre-selected persona ids
|
|
47
|
+
skills_hint: [developer-like-execution, verify-before-complete, minimal-safe-diff]
|
|
48
|
+
surface:
|
|
49
|
+
commands_hint: [work, implement-ticket, review-changes, fix]
|
|
50
|
+
docs_first_pointer: "docs/getting-started-by-role.md#developer"
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Per [ADR-010](../decisions/ADR-010-profile-pack-preset-boundary.md), a
|
|
54
|
+
profile **MAY** set `defaults.preset_id` but **MAY NOT** set any
|
|
55
|
+
preset-owned knob directly. The lint task (`task lint-config-schema`)
|
|
56
|
+
enforces this.
|
|
57
|
+
|
|
58
|
+
## Loader contract
|
|
59
|
+
|
|
60
|
+
The Phase 1 loader lives at `scripts/config/profiles.py`. Resolution
|
|
61
|
+
chain (last writer wins):
|
|
62
|
+
|
|
63
|
+
1. `pack.profile_id` (if pack active) → `profile.id`.
|
|
64
|
+
2. `.agent-settings.yml` top-level `profile:` block → `profile.id`
|
|
65
|
+
and any user overrides for `audience` / `defaults` / `surface`.
|
|
66
|
+
3. Environment variable `AGENT_CONFIG_PROFILE_ID` → `profile.id`.
|
|
67
|
+
4. Runtime CLI flag `--profile=<id>` → `profile.id`, single session.
|
|
68
|
+
|
|
69
|
+
If no profile resolves, the loader **does not pick a default
|
|
70
|
+
silently** — it falls back to `developer` only when
|
|
71
|
+
`.agent-settings.yml` is missing entirely (fresh install before
|
|
72
|
+
`/onboard`). With a settings file present but no `profile:` block,
|
|
73
|
+
the loader raises a structured warning pointing to `/onboard`.
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
RATIONALE: a silent default would hide the "I never picked an audience"
|
|
77
|
+
state from the wizard, breaking the council v3 observation that audience
|
|
78
|
+
choice must be a deliberate act of the user, not an agent inference.
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Resolution outcome
|
|
82
|
+
|
|
83
|
+
After the loader runs, the session has:
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
{
|
|
87
|
+
"id": "developer",
|
|
88
|
+
"audience": {"label": "IC engineer", "readme_anchor": "developer"},
|
|
89
|
+
"preset_id": "balanced",
|
|
90
|
+
"personas": ["reviewer", "security"],
|
|
91
|
+
"skills_hint": ["developer-like-execution", ...],
|
|
92
|
+
"commands_hint": ["work", "implement-ticket", ...],
|
|
93
|
+
"source": "user-settings | env | runtime | pack | default",
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
The `source` field is mandatory and feeds the
|
|
98
|
+
`/agent-config explain`
|
|
99
|
+
command (Phase 1 item 3).
|
|
100
|
+
|
|
101
|
+
## User-defined profiles
|
|
102
|
+
|
|
103
|
+
A consumer project MAY ship a custom profile under
|
|
104
|
+
`.agent-src.uncompressed/profiles/<id>.yml`. Constraints:
|
|
105
|
+
|
|
106
|
+
- `id` MUST be unique across seed + user-defined profiles.
|
|
107
|
+
- Shape MUST match the seed contract above (audience / defaults / surface).
|
|
108
|
+
- `defaults.preset_id` MUST reference an existing preset
|
|
109
|
+
([`config-presets.md`](config-presets.md)).
|
|
110
|
+
- The lint task hard-fails on schema violations.
|
|
111
|
+
|
|
112
|
+
User-defined profiles do **not** require an ADR — they are project-local.
|
|
113
|
+
Only changes to the **seed set** require an ADR.
|
|
114
|
+
|
|
115
|
+
## Drift detection
|
|
116
|
+
|
|
117
|
+
`task lint-config-schema` (added in Phase 1) hard-fails when:
|
|
118
|
+
|
|
119
|
+
- A profile YAML names a preset-owned knob (cost cap, autonomy,
|
|
120
|
+
confidence, risk).
|
|
121
|
+
- A profile YAML references a non-existent `preset_id`.
|
|
122
|
+
- The seed-profile count diverges from this contract's table.
|
|
123
|
+
- `defaults.personas` references a persona id that does not exist
|
|
124
|
+
under `.agent-src.uncompressed/personas/`.
|
|
125
|
+
|
|
126
|
+
## Non-goals
|
|
127
|
+
|
|
128
|
+
- This contract does **not** define preset knobs. See
|
|
129
|
+
[`config-presets.md`](config-presets.md).
|
|
130
|
+
- It does **not** define packs. See `workflow-packs.md` (Phase 2 item 7).
|
|
131
|
+
- It does **not** override `cost_profile`. The rule-tier loader keeps
|
|
132
|
+
its independent axis per
|
|
133
|
+
[`cost-profile-defaults.md`](cost-profile-defaults.md).
|
|
134
|
+
- It does **not** ship a UI. Profile selection happens in `/onboard`
|
|
135
|
+
(step-15 Phase 1 item 2).
|
|
136
|
+
|
|
137
|
+
## See also
|
|
138
|
+
|
|
139
|
+
- [`ADR-010`](../decisions/ADR-010-profile-pack-preset-boundary.md) — axis boundary.
|
|
140
|
+
- [`config-presets.md`](config-presets.md) — preset knobs.
|
|
141
|
+
- [`cost-profile-defaults.md`](cost-profile-defaults.md) — rule-tier axis (orthogonal).
|
|
142
|
+
- `step-15-product-refinement` — Phase 1 item 1.
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
---
|
|
2
|
+
stability: beta
|
|
3
|
+
keep-beta-until: 2026-08-12
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Universal safety model
|
|
7
|
+
|
|
8
|
+
> **Status:** beta — first draft 2026-05-16 (Phase 2 Item 9 of
|
|
9
|
+
> `step-15-product-refinement`).
|
|
10
|
+
>
|
|
11
|
+
> **Baseline:** [`docs/architecture/current-safety-behavior.md`](../architecture/current-safety-behavior.md)
|
|
12
|
+
> documents the pre-step-15 surface this contract replaces.
|
|
13
|
+
|
|
14
|
+
A **per-profile, per-domain safety policy** declared as a single
|
|
15
|
+
machine-readable table. Replaces the legacy "one autonomy switch for
|
|
16
|
+
everything" model documented in the baseline. Does **not** weaken the
|
|
17
|
+
four non-overridable floors — those keep their universal scope and
|
|
18
|
+
are referenced by id, not redeclared here.
|
|
19
|
+
|
|
20
|
+
## The Iron Floor
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
NO POLICY ENTRY MAY WIDEN AN EXISTING FLOOR.
|
|
24
|
+
ANY ENTRY THAT WOULD ALLOW A FLOOR-BLOCKED ACTION IS REJECTED AT LINT.
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
The four floors are listed in
|
|
28
|
+
[`current-safety-behavior § The four non-overridable floors`](../architecture/current-safety-behavior.md#the-four-non-overridable-floors):
|
|
29
|
+
`non-destructive-by-default`, `scope-control § git-ops`,
|
|
30
|
+
`commit-policy`, `security-sensitive-stop`. Floor membership is
|
|
31
|
+
maintained in [`kernel-membership`](kernel-membership.md); a domain
|
|
32
|
+
listed there cannot be set to `allow` here.
|
|
33
|
+
|
|
34
|
+
## Schema
|
|
35
|
+
|
|
36
|
+
```yaml
|
|
37
|
+
# .agent-src.uncompressed/profiles/<id>.yml — new top-level key
|
|
38
|
+
profile:
|
|
39
|
+
id: <profile.id>
|
|
40
|
+
# ... existing fields ...
|
|
41
|
+
safety:
|
|
42
|
+
domains:
|
|
43
|
+
<domain-id>:
|
|
44
|
+
policy: <deny | ask | allow>
|
|
45
|
+
rationale: "<= 280 chars — why this policy for this profile>"
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### Domain registry
|
|
49
|
+
|
|
50
|
+
Domains are declared in this contract, **not** invented per profile.
|
|
51
|
+
A profile may only reference an id from the table below.
|
|
52
|
+
|
|
53
|
+
| Domain id | What it gates | Floor reference |
|
|
54
|
+
|---|---|---|
|
|
55
|
+
| `prod_data` | Reads / writes against production data stores. | `non-destructive-by-default` |
|
|
56
|
+
| `prod_infra` | Terraform / k8s / cloud config touching prod. | `non-destructive-by-default` |
|
|
57
|
+
| `secrets` | Secret values in env, config, or output. | `security-sensitive-stop` |
|
|
58
|
+
| `auth_changes` | Auth, session, tenant-boundary, IAM edits. | `security-sensitive-stop` |
|
|
59
|
+
| `billing` | Pricing, invoicing, refund, payout logic. | `security-sensitive-stop` |
|
|
60
|
+
| `bulk_delete` | `rm -rf`, `DROP`, `TRUNCATE`, ≥ 5-file deletion. | `non-destructive-by-default` |
|
|
61
|
+
| `git_push` | `git push` to any remote. | `scope-control § git-ops` |
|
|
62
|
+
| `git_branch` | branch create / switch / delete. | `scope-control § git-ops` |
|
|
63
|
+
| `commit` | Any git commit. | `commit-policy` |
|
|
64
|
+
| `mcp_call_costly` | MCP / web / model call ≥ preset's `per_call_max_usd`. | — (advisory) |
|
|
65
|
+
| `pii_redact` | PII redaction in support / finance / recruiting / marketing outputs. | `domain-safety-pii-*` |
|
|
66
|
+
| `pii_log` | Logging of raw PII. | `domain-safety-logging-pii-floor` |
|
|
67
|
+
| `legal_advice` | Output shaped as legal advice. | `domain-safety-disclaimer-legal` |
|
|
68
|
+
| `medical_advice` | Output shaped as medical advice. | `domain-safety-disclaimer-medical` |
|
|
69
|
+
| `financial_advice` | Investment / tax / valuation positions. | `domain-safety-disclaimer-financial` |
|
|
70
|
+
| `pr_create` | Pull-request open / close / retarget. | `scope-control § git-ops` |
|
|
71
|
+
| `deploy` | Deploy / release / tag / pipeline trigger. | `non-destructive-by-default` |
|
|
72
|
+
|
|
73
|
+
### Policy semantics
|
|
74
|
+
|
|
75
|
+
| Policy | Behaviour | Floor interaction |
|
|
76
|
+
|---|---|---|
|
|
77
|
+
| `deny` | The agent refuses. Numbered-option block surfaces the refusal and the rationale field; no override path. | `deny` is the default for every floor domain — it cannot be relaxed. |
|
|
78
|
+
| `ask` | The agent stops and asks a single numbered question per [`user-interaction`](../../.agent-src/rules/user-interaction.md). One question per turn. | `ask` is the default for every floor domain in a profile that has not opted out — the floor remains operative even when `policy=allow` is set elsewhere. |
|
|
79
|
+
| `allow` | The agent proceeds without asking. Trivial-question suppression applies. | `allow` is **forbidden** on any domain whose `Floor reference` column is non-empty. Linter rejects it. |
|
|
80
|
+
|
|
81
|
+
The legacy single switch (`personal.autonomy`) is preserved as a
|
|
82
|
+
**fallback** for any domain a profile does not declare — keeping
|
|
83
|
+
existing installs functional while profiles migrate.
|
|
84
|
+
|
|
85
|
+
## Resolution
|
|
86
|
+
|
|
87
|
+
Order (last writer wins, subject to the Iron Floor):
|
|
88
|
+
|
|
89
|
+
1. Domain default = `ask` for floor domains, `allow` otherwise.
|
|
90
|
+
2. Profile `safety.domains.<id>.policy`.
|
|
91
|
+
3. Active pack's profile (if `--pack <id>` is active).
|
|
92
|
+
4. `.agent-settings.yml` user override under `profile.safety.domains`.
|
|
93
|
+
|
|
94
|
+
The explain command at [`explain config`](../../.agent-src/scripts/agent-config)
|
|
95
|
+
(Phase 1 Item 3 deliverable) surfaces the resolved policy per domain,
|
|
96
|
+
with the writer source per row.
|
|
97
|
+
|
|
98
|
+
## Validation
|
|
99
|
+
|
|
100
|
+
`scripts/lint_safety_model.py` (Phase 2 deliverable — not yet
|
|
101
|
+
shipped) fails CI on:
|
|
102
|
+
|
|
103
|
+
- Unknown domain id.
|
|
104
|
+
- `allow` on a floor-referenced domain.
|
|
105
|
+
- Missing `rationale` (≤ 280 chars, plain prose).
|
|
106
|
+
- Profile declaring `safety` without at least one entry.
|
|
107
|
+
|
|
108
|
+
Until the linter lands, profiles are reviewed by hand at PR time.
|
|
109
|
+
|
|
110
|
+
## What this contract does **not** do
|
|
111
|
+
|
|
112
|
+
- **Does not** introduce new safety rules. Every domain row maps to
|
|
113
|
+
an existing rule or to advisory cost guidance.
|
|
114
|
+
- **Does not** ship the loader. `scripts/config/safety.py` is a
|
|
115
|
+
Phase 2 deliverable deferred to its own step.
|
|
116
|
+
- **Does not** override domain-safety output floors. PII redaction
|
|
117
|
+
and disclaimer rules apply regardless of `safety.domains.*` —
|
|
118
|
+
`policy=allow` on `pii_redact` means "do not ask before redacting",
|
|
119
|
+
not "skip redaction".
|
|
120
|
+
- **Does not** authorize per-tool MCP overrides. Cost caps live in
|
|
121
|
+
[`config-presets`](config-presets.md).
|
|
122
|
+
|
|
123
|
+
## See also
|
|
124
|
+
|
|
125
|
+
- [`current-safety-behavior`](../architecture/current-safety-behavior.md) — pre-step-15 baseline (what this replaces)
|
|
126
|
+
- [`config-presets`](config-presets.md) — cost caps and enforcement
|
|
127
|
+
- [`profile-system`](profile-system.md) — profile axis
|
|
128
|
+
- [`workflow-packs`](workflow-packs.md) — pack-level overrides
|
|
129
|
+
- `step-15-product-refinement` § Phase 2 Item 9
|