@event4u/agent-config 3.1.1 → 3.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent-src/commands/agent-status.md +1 -1
- package/.agent-src/commands/analytics/prune.md +78 -0
- package/.agent-src/commands/analytics/show.md +107 -0
- package/.agent-src/commands/analytics.md +64 -0
- package/.agent-src/commands/knowledge/forget.md +104 -0
- package/.agent-src/commands/knowledge/ingest.md +122 -0
- package/.agent-src/commands/knowledge/list.md +102 -0
- package/.agent-src/commands/knowledge.md +75 -0
- package/.agent-src/scripts/update_roadmap_progress.py +1 -1
- package/.agent-src/skills/compress-memory/SKILL.md +1 -1
- package/.agent-src/templates/agents/agent-project-settings.example.yml +1 -1
- package/.claude-plugin/marketplace.json +8 -1
- package/AGENTS.md +5 -4
- package/CHANGELOG.md +54 -222
- package/README.md +12 -2
- package/dist/discovery/deprecation-report.md +1 -1
- package/dist/discovery/discovery-manifest.json +164 -10
- package/dist/discovery/discovery-manifest.json.sha256 +1 -1
- package/dist/discovery/discovery-manifest.summary.md +3 -3
- package/dist/discovery/orphan-report.md +1 -1
- package/dist/discovery/packs.json +12 -5
- package/dist/discovery/trust-report.md +2 -2
- package/dist/discovery/workspaces.json +11 -4
- package/dist/mcp/mcp-cloudflare-catalogue.json +2 -0
- package/dist/mcp/registry-manifest.json +5 -3
- package/docs/architecture.md +1 -1
- package/docs/archive/CHANGELOG-pre-3.2.0.md +268 -0
- package/docs/benchmarks.md +4 -4
- package/docs/catalog.md +9 -2
- package/docs/contracts/CHANGELOG-conventions.md +20 -1
- package/docs/contracts/adr-mcp-runtime.md +1 -1
- package/docs/contracts/at-rest-encryption.md +146 -0
- package/docs/contracts/benchmark-corpus-spec.md +3 -3
- package/docs/contracts/benchmark-report-schema.md +5 -5
- package/docs/contracts/caveman-telemetry.md +4 -4
- package/docs/contracts/compression-default-kill-criterion.md +5 -5
- package/docs/contracts/cost-enforcement.md +1 -1
- package/docs/contracts/daily-workspace.md +137 -0
- package/docs/contracts/explain-modes.md +146 -0
- package/docs/contracts/host-agent-protocol.md +88 -0
- package/docs/contracts/local-analytics.md +148 -0
- package/docs/contracts/local-knowledge-ingestion.md +96 -0
- package/docs/contracts/mcp-beta-criteria.md +1 -1
- package/docs/contracts/mcp-cloud-scope.md +4 -4
- package/docs/contracts/mcp-registry-manifest.schema.json +1 -1
- package/docs/contracts/mcp-tool-inventory.md +1 -1
- package/docs/contracts/mcp-tool-stub-envelope.md +1 -1
- package/docs/contracts/measurement-baseline.md +6 -6
- package/docs/contracts/role-experience.md +121 -0
- package/docs/contracts/workspace-documents.md +140 -0
- package/docs/decisions/ADR-022-daily-workspace-decomposition.md +140 -0
- package/docs/decisions/ADR-023-host-agent-protocol.md +129 -0
- package/docs/decisions/ADR-024-workspace-v0-feature-floor.md +126 -0
- package/docs/decisions/ADR-025-workspace-chrome.md +119 -0
- package/docs/decisions/ADR-026-explain-mode-translation.md +117 -0
- package/docs/decisions/ADR-027-changelog-machine-vs-manual.md +129 -0
- package/docs/decisions/ADR-028-root-layout.md +147 -0
- package/docs/decisions/ADR-029-multi-workspace-deferred.md +122 -0
- package/docs/decisions/INDEX.md +8 -0
- package/docs/deploy/small-team-recipe.md +148 -0
- package/docs/deploy/team-deployment-posture.md +91 -0
- package/docs/getting-started-by-role.md +27 -0
- package/docs/getting-started.md +1 -1
- package/docs/guides/local-analytics.md +125 -0
- package/docs/guides/local-knowledge.md +127 -0
- package/docs/mcp-server.md +1 -1
- package/docs/parity/bench-ruflo.json +3 -3
- package/docs/parity/ruflo.md +1 -1
- package/docs/setup/mcp-client-config.md +1 -1
- package/docs/setup/mcp-cloud-endpoints.md +1 -1
- package/docs/setup/mcp-cloud-setup.md +2 -2
- package/docs/setup/mcp-r2-bootstrap.md +1 -1
- package/package.json +4 -2
- package/scripts/__pycache__/validate_frontmatter.cpython-312.pyc +0 -0
- package/scripts/_lib/__pycache__/__init__.cpython-312.pyc +0 -0
- package/scripts/_lib/__pycache__/agent_src.cpython-312.pyc +0 -0
- package/scripts/_lib/bench_caveman.py +2 -2
- package/scripts/_lib/bench_caveman_report.py +1 -1
- package/scripts/_lib/bench_cost.py +2 -2
- package/scripts/_lib/bench_report.py +2 -2
- package/scripts/_lib/changelog_eras.py +330 -0
- package/scripts/audit_mcp_tools.py +1 -1
- package/scripts/bench_baseline_ready.py +3 -3
- package/scripts/bench_compress_memory.py +4 -4
- package/scripts/bench_drift_check.py +2 -2
- package/scripts/bench_per_tool.py +2 -2
- package/scripts/bench_run.py +4 -4
- package/scripts/build_mcp_registry_manifest.py +2 -2
- package/scripts/mcp_server/__init__.py +1 -1
- package/scripts/mcp_server/catalog.py +1 -1
- package/scripts/mcp_server/consumer_tool_catalog.json +1 -1
- package/scripts/mcp_server/tools.py +1 -1
- package/scripts/memory_lookup.py +78 -1
- package/scripts/pack_mcp_content.py +6 -6
- package/scripts/release.py +93 -3
- package/scripts/skill_trigger_eval.py +2 -2
|
@@ -13,7 +13,7 @@ keep-beta-until: 2026-08-15
|
|
|
13
13
|
|
|
14
14
|
| Key | Value | Provenance |
|
|
15
15
|
|---|---|---|
|
|
16
|
-
| `caveman_multiplier_version` | `v1` | Tied to `bench/reports/caveman-v1.{json,md}` |
|
|
16
|
+
| `caveman_multiplier_version` | `v1` | Tied to `internal/bench/reports/caveman-v1.{json,md}` |
|
|
17
17
|
| `caveman_multiplier_value` | `0.9155` | `median(terse_control_tokens / compressed_tokens)` over the 10-prompt v1 corpus |
|
|
18
18
|
| `caveman_multiplier_p10` | `0.4506` | 10th percentile (worst-case carve-out-tax prompts) |
|
|
19
19
|
| `caveman_multiplier_p90` | `2.3664` | 90th percentile (pure-prose prompts where caveman wins) |
|
|
@@ -40,7 +40,7 @@ where `M = caveman_multiplier_value`.
|
|
|
40
40
|
|
|
41
41
|
## Why suspended after v1
|
|
42
42
|
|
|
43
|
-
The `caveman-v1` bench (`bench/reports/caveman-v1.md`, 30 calls,
|
|
43
|
+
The `caveman-v1` bench (`internal/bench/reports/caveman-v1.md`, 30 calls,
|
|
44
44
|
2026-05-16) found:
|
|
45
45
|
|
|
46
46
|
- Median savings vs raw uncompressed: **+23.51 %** (inflated by the
|
|
@@ -78,6 +78,6 @@ Until a v2 bench (broader corpus or a re-tuned dialect) lifts the
|
|
|
78
78
|
## See also
|
|
79
79
|
|
|
80
80
|
- [`compression-default-kill-criterion.md`](compression-default-kill-criterion.md) — the rule-default-flip gate; this multiplier is gated on the same `vs_terse` arm.
|
|
81
|
-
- [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
|
|
82
|
-
- [`bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
|
|
81
|
+
- [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) — provenance for the `v1` value.
|
|
82
|
+
- [`internal/bench/reports/caveman-v2.md`](../../bench/reports/caveman-v2.md) — input-side (orthogonal); does NOT feed this multiplier (this multiplier is output-side).
|
|
83
83
|
- [`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md) — runtime rule the multiplier measures.
|
|
@@ -6,7 +6,7 @@ keep-beta-until: 2026-08-14
|
|
|
6
6
|
# Compression default — kill-criterion
|
|
7
7
|
|
|
8
8
|
> **Status:** v1-measured · criterion not met · default stays `off` · **Owner:** `step-16-caveman-substance.md`
|
|
9
|
-
> Phase 1 closeout · **Sources:** [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
|
|
9
|
+
> Phase 1 closeout · **Sources:** [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md) ·
|
|
10
10
|
> [`council-synthesis.md` § 7](../../agents/evidence/audits/2026-05-14-north-star/council-synthesis.md) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict --> ·
|
|
11
11
|
> [`caveman-v1-kc-verdict.json`](../../agents/runtime/council/responses/caveman-v1-kc-verdict.json) <!-- council-ref-allowed: ADR decision trace for v1 kill-criterion verdict -->
|
|
12
12
|
|
|
@@ -23,14 +23,14 @@ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
|
|
|
23
23
|
[`caveman-speak`](../../.agent-src.uncompressed/rules/caveman-speak.md)
|
|
24
24
|
but the feature is non-promoted: no skill recommends turning it on,
|
|
25
25
|
no preset enables it, no profile depends on it.
|
|
26
|
-
2. **Baselines.** Every published `bench/reports/caveman-v<N>.{json,md}`
|
|
26
|
+
2. **Baselines.** Every published `internal/bench/reports/caveman-v<N>.{json,md}`
|
|
27
27
|
measures three arms (`compressed` · `terse-control` ·
|
|
28
28
|
`uncompressed`) and reports two savings columns:
|
|
29
29
|
- `vs_raw` — median savings against the uncompressed arm.
|
|
30
30
|
- `vs_terse` — **load-bearing** median savings against the
|
|
31
31
|
`Answer concisely.` terse-control arm. `vs_raw` is inflated by the
|
|
32
32
|
carve-out-tax-free pure-prose case and is **not** the gate metric.
|
|
33
|
-
3. **Decision table.** Read the latest `bench/reports/caveman-v<N>.md`
|
|
33
|
+
3. **Decision table.** Read the latest `internal/bench/reports/caveman-v<N>.md`
|
|
34
34
|
and apply exactly one of:
|
|
35
35
|
|
|
36
36
|
| Measured `vs_terse` median | Quality regression on corpus | Verdict |
|
|
@@ -50,7 +50,7 @@ DECISION OWNED BY THE NEXT BENCH CLOSEOUT, NOT BY THIS DOC.
|
|
|
50
50
|
|
|
51
51
|
## v1 verdict (2026-05-16)
|
|
52
52
|
|
|
53
|
-
[`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
|
|
53
|
+
[`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
|
|
54
54
|
landed 30 calls · $0.0805 · 0 errors · `claude-sonnet-4-5`:
|
|
55
55
|
|
|
56
56
|
| Metric | Median | p10 | p90 |
|
|
@@ -100,7 +100,7 @@ re-litigating compression on every PR.
|
|
|
100
100
|
|
|
101
101
|
## Cross-references
|
|
102
102
|
|
|
103
|
-
- [`bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
|
|
103
|
+
- [`internal/bench/reports/caveman-v1.md`](../../bench/reports/caveman-v1.md)
|
|
104
104
|
— v1 measurement; canonical baseline this doc cites.
|
|
105
105
|
- [`docs/benchmarks.md`](../benchmarks.md)
|
|
106
106
|
— cadence + when the next bench run is mandatory.
|
|
@@ -131,4 +131,4 @@ suite is wired to `task test-cost-budget` per `step-11` Phase 2 Step 5.
|
|
|
131
131
|
- `step-11-ruflo-parity` — Measurement & Governance Parity roadmap.
|
|
132
132
|
- `docs/contracts/cost-dashboard.md` — companion dashboard contract.
|
|
133
133
|
- `scripts/cost/budget.mjs` — evaluator implementation.
|
|
134
|
-
- `bench/pricing.yaml` — per-model USD pricing table.
|
|
134
|
+
- `internal/bench/pricing.yaml` — per-model USD pricing table.
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# Daily Workspace Surface Contract
|
|
2
|
+
|
|
3
|
+
> **Status** · v0 / design · 2026-05-24. Surface contract for the daily
|
|
4
|
+
> workspace introduced as Phase 4 of the employee-product workstream.
|
|
5
|
+
> Governed by ADRs [`022`](../decisions/ADR-022-daily-workspace-decomposition.md) ·
|
|
6
|
+
> [`023`](../decisions/ADR-023-host-agent-protocol.md) ·
|
|
7
|
+
> [`024`](../decisions/ADR-024-workspace-v0-feature-floor.md) ·
|
|
8
|
+
> [`025`](../decisions/ADR-025-workspace-chrome.md).
|
|
9
|
+
|
|
10
|
+
## Shape (v0)
|
|
11
|
+
|
|
12
|
+
Browser tab at `http://127.0.0.1:<gui-port>/workspace`, served by the
|
|
13
|
+
existing installer GUI (`packages/core/installer/src/gui/server.ts`).
|
|
14
|
+
Same CSRF token, same loopback bind, same kill-switch as
|
|
15
|
+
[`gui-wizard`](gui-wizard.md). Launched via
|
|
16
|
+
`npx @event4u/agent-config workspace` (alias for
|
|
17
|
+
`init --gui --route=/workspace` once wired).
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
┌─ /workspace ─────────────────────────────────────────────────┐
|
|
21
|
+
│ [identity strip — shared with installer GUI shell] │
|
|
22
|
+
├────────────────────┬─────────────────────────────────────────┤
|
|
23
|
+
│ Role + Task │ Active session log │
|
|
24
|
+
│ launcher │ (latest JSONL entries, append-only) │
|
|
25
|
+
│ │ │
|
|
26
|
+
│ - galabau │ ▸ 12:04 launch · role=galabau │
|
|
27
|
+
│ - content-creator │ ▸ 12:05 host · claude / tier-1 │
|
|
28
|
+
│ - consultant │ ▸ 12:08 host · turn.completed │
|
|
29
|
+
│ │ │
|
|
30
|
+
│ (Phase 3 roles) │ Knowledge pane │
|
|
31
|
+
│ │ - source: handbuch.pdf │
|
|
32
|
+
│ │ - source: angebot-template.md │
|
|
33
|
+
│ │ (Phase 2 namespace; "no sources yet" │
|
|
34
|
+
│ │ when empty) │
|
|
35
|
+
└────────────────────┴─────────────────────────────────────────┘
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
No left / centre / right three-rail layout in v0 (deferred per
|
|
39
|
+
ADR-024). One launcher, one log, one stub pane.
|
|
40
|
+
|
|
41
|
+
## Endpoints (additions to the GUI server)
|
|
42
|
+
|
|
43
|
+
All endpoints CSRF-gated, loopback-bound. Existing wizard endpoints
|
|
44
|
+
in [`gui-wizard`](gui-wizard.md) are untouched.
|
|
45
|
+
|
|
46
|
+
| Method · Path | Purpose |
|
|
47
|
+
|---|---|
|
|
48
|
+
| `GET /workspace` | HTML shell + initial state (role list, recent sessions). |
|
|
49
|
+
| `GET /api/v1/workspace/roles` | List available roles from `agents/roles/<role>/`. |
|
|
50
|
+
| `GET /api/v1/workspace/roles/:role/tasks` | Per-role task list from `skills.yml` + `prompts/`. |
|
|
51
|
+
| `POST /api/v1/workspace/launch` | Body: `{ role, task, host? }`. Resolves host via ADR-023 tier; runs the launch; appends to JSONL log. |
|
|
52
|
+
| `GET /api/v1/workspace/sessions` | List of recent sessions (≤ 20, ordered by mtime). |
|
|
53
|
+
| `GET /api/v1/workspace/sessions/:id` | Streams the JSONL log for one session. |
|
|
54
|
+
| `GET /api/v1/workspace/knowledge` | Snapshot of the current `knowledge:` memory namespace (read-only). |
|
|
55
|
+
|
|
56
|
+
## Session JSONL schema
|
|
57
|
+
|
|
58
|
+
Path: `~/.event4u/agent-config/workspace/sessions/<yyyy-mm-dd>/<session-id>.jsonl`
|
|
59
|
+
(one file per session; append-only; UTF-8). Session id = `YYYYMMDDTHHMMSSZ-<8-hex>`.
|
|
60
|
+
|
|
61
|
+
Each line is one JSON record with the shared envelope:
|
|
62
|
+
|
|
63
|
+
```json
|
|
64
|
+
{ "ts": "<iso-8601-utc>", "kind": "<event-kind>", "data": { … } }
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Event kinds:
|
|
68
|
+
|
|
69
|
+
- `launcher.input` — `{ role, task, rendered_prompt, host_tier, host_id }`
|
|
70
|
+
- `host.turn` — `{ host_id, turn_id, model, input_tokens, output_tokens, latency_ms }`
|
|
71
|
+
- `host.output` — `{ host_id, turn_id, role: "assistant", text }` *(verbatim host envelope text — Tier 1 only)*
|
|
72
|
+
- `host.tool` — `{ host_id, turn_id, tool_name, input, output_excerpt }` *(when the host envelope surfaces it)*
|
|
73
|
+
- `host.error` — `{ host_id, message, exit_code }`
|
|
74
|
+
- `inbox.handoff` — `{ inbox_path, copied_to_clipboard: bool }` *(Tier 3 only)*
|
|
75
|
+
|
|
76
|
+
No PII in filenames. No remote sync. Encryption-at-rest deferred to a
|
|
77
|
+
future ADR.
|
|
78
|
+
|
|
79
|
+
## Inbox handoff (Tier 3)
|
|
80
|
+
|
|
81
|
+
Path: `~/.event4u/agent-config/workspace/inbox/<yyyy-mm-dd>/<id>.md`.
|
|
82
|
+
|
|
83
|
+
```markdown
|
|
84
|
+
---
|
|
85
|
+
created_at: 2026-05-24T12:08:00Z
|
|
86
|
+
role: galabau
|
|
87
|
+
task: angebot-erstellen
|
|
88
|
+
host_tier: 3
|
|
89
|
+
host_id: cursor
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
[rendered prompt body — skill context inlined per ADR-023]
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
The UI surfaces a one-line banner: "Workspace wrote
|
|
96
|
+
`~/.event4u/.../<id>.md`. Open it in Cursor and paste." Clicking
|
|
97
|
+
the banner copies the path to clipboard.
|
|
98
|
+
|
|
99
|
+
## Skill resolution
|
|
100
|
+
|
|
101
|
+
Tier 1 with skill surface (Claude Code only) — workspace passes the
|
|
102
|
+
slash command as part of the prompt body (`/work "<task>"` style)
|
|
103
|
+
and lets the host resolve it from `.claude/commands/`.
|
|
104
|
+
|
|
105
|
+
Tier 1 without skill surface (Codex, Gemini) and Tier 3 — workspace
|
|
106
|
+
**inlines** the skill body into the rendered prompt. The host gets
|
|
107
|
+
the prompt with skill context as a self-contained block.
|
|
108
|
+
|
|
109
|
+
## State scope
|
|
110
|
+
|
|
111
|
+
- Per-user. Local-only. One workspace per OS user.
|
|
112
|
+
- No multi-tenant view in v0. Multi-user deployment (the topology
|
|
113
|
+
from [`ADR-021`](../decisions/ADR-021-deployment-shape.md)) is
|
|
114
|
+
out of scope for v0.
|
|
115
|
+
- Closing the browser tab does not kill running host subprocesses.
|
|
116
|
+
Reopening shows the live JSONL log.
|
|
117
|
+
|
|
118
|
+
## Failure modes & telemetry
|
|
119
|
+
|
|
120
|
+
- Host CLI not installed → workspace renders "Host `<id>` not
|
|
121
|
+
available" banner with install link. No silent fallback.
|
|
122
|
+
- JSON envelope shape change → demote host to Tier 3 per ADR-023.
|
|
123
|
+
- Inbox write failure (disk full, permissions) → red banner; no
|
|
124
|
+
silent loss.
|
|
125
|
+
|
|
126
|
+
Telemetry stays off by default (project inertia). When the user
|
|
127
|
+
opts in via `.agent-settings.yml`, the workspace emits
|
|
128
|
+
`workspace.launch`, `workspace.host_turn`, `workspace.inbox_handoff`
|
|
129
|
+
counters only. No prompt bodies, no response bodies.
|
|
130
|
+
|
|
131
|
+
## Cross-references
|
|
132
|
+
|
|
133
|
+
- ADRs: [`022`](../decisions/ADR-022-daily-workspace-decomposition.md) · [`023`](../decisions/ADR-023-host-agent-protocol.md) · [`024`](../decisions/ADR-024-workspace-v0-feature-floor.md) · [`025`](../decisions/ADR-025-workspace-chrome.md).
|
|
134
|
+
- Host-agent protocol: [`host-agent-protocol`](host-agent-protocol.md).
|
|
135
|
+
- GUI substrate: [`gui-wizard`](gui-wizard.md).
|
|
136
|
+
- Knowledge ingestion: [`local-knowledge-ingestion`](local-knowledge-ingestion.md).
|
|
137
|
+
- Role experience: [`role-experience`](role-experience.md).
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
# Explain Modes Contract
|
|
2
|
+
|
|
3
|
+
> **Status** · v0 / design · 2026-05-24. Phase 6 of the
|
|
4
|
+
> employee-product workstream.
|
|
5
|
+
> Governed by [`ADR-026`](../decisions/ADR-026-explain-mode-translation.md).
|
|
6
|
+
> Translates the existing engineer-shaped `explain-v1` envelope into a
|
|
7
|
+
> role-aware plain surface, without changing the underlying data.
|
|
8
|
+
|
|
9
|
+
## Two modes over one envelope
|
|
10
|
+
|
|
11
|
+
The agent-memory MCP already returns an `explain-v1` envelope per
|
|
12
|
+
`memory_explain`. It speaks engineer: `trust_score`, `score_breakdown`,
|
|
13
|
+
`promotion_history`, `contradictions`, `decay`. Phase 6 keeps that
|
|
14
|
+
envelope as the single source of truth and renders **two views**:
|
|
15
|
+
|
|
16
|
+
| Mode | Default for | Vocabulary |
|
|
17
|
+
|---|---|---|
|
|
18
|
+
| `technical` | engineering-lead, platform-engineer, default for `--debug` flag | trust_score, decay rate, promotion path, contradictions count |
|
|
19
|
+
| `plain` | every other role (galabau, content-creator, consultant, …) | "where this came from", "how confident", "when last reviewed", "what's contested" |
|
|
20
|
+
|
|
21
|
+
No new MCP call. No new data fetch. The plain renderer is a **pure
|
|
22
|
+
function** over the existing envelope.
|
|
23
|
+
|
|
24
|
+
## Field mapping
|
|
25
|
+
|
|
26
|
+
| envelope field | technical label | plain label (default) |
|
|
27
|
+
|---|---|---|
|
|
28
|
+
| `trust_score` (0.0–1.0) | "Trust score" | "Confidence" with 4-band label (Very High ≥ 0.85 · High ≥ 0.65 · Medium ≥ 0.40 · Low < 0.40) |
|
|
29
|
+
| `score_breakdown.validation` | "Validation contribution" | "How well it's been checked" |
|
|
30
|
+
| `score_breakdown.usage` | "Usage contribution" | "How often it's been used" |
|
|
31
|
+
| `score_breakdown.recency` | "Recency contribution" | "How recently it was confirmed" |
|
|
32
|
+
| `score_breakdown.contradictions` | "Contradiction penalty" | "Disagreements found" |
|
|
33
|
+
| `promotion_history[]` | "Promotion timeline" | "When this was confirmed" (most recent first, ≤ 3 entries) |
|
|
34
|
+
| `contradictions[]` | "Unresolved contradictions" | "What disagrees with this" |
|
|
35
|
+
| `decay.applied_factor` | "Decay factor" | "Freshness" with 3-band label (Fresh ≥ 0.80 · Aging ≥ 0.50 · Stale < 0.50) |
|
|
36
|
+
| `evidence.sources[]` | "Sources" | "Where this came from" |
|
|
37
|
+
| `last_reviewed_at` | "Last reviewed" | "When last reviewed" + human-relative ("3 days ago") |
|
|
38
|
+
|
|
39
|
+
The technical view renders one section per envelope field, terse,
|
|
40
|
+
tabular. The plain view renders four labelled paragraphs:
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
Where this came from
|
|
44
|
+
3 sources — handbuch.pdf · offer-template.md · 1 council vote.
|
|
45
|
+
|
|
46
|
+
How confident
|
|
47
|
+
High (0.74). Last confirmed 3 days ago.
|
|
48
|
+
|
|
49
|
+
When last reviewed
|
|
50
|
+
2026-05-21 — by the maintenance pass.
|
|
51
|
+
|
|
52
|
+
What's contested
|
|
53
|
+
No open disagreements.
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Per-role glossary override
|
|
57
|
+
|
|
58
|
+
Each role may ship an `agents/roles/<role>/explain-glossary.yml`
|
|
59
|
+
that overrides default plain-mode labels and the 4-band threshold
|
|
60
|
+
points. The file is optional; missing → defaults are used.
|
|
61
|
+
|
|
62
|
+
```yaml
|
|
63
|
+
# agents/roles/galabau/explain-glossary.yml
|
|
64
|
+
schema: explain-glossary/v0
|
|
65
|
+
labels:
|
|
66
|
+
confidence: "Sicherheit"
|
|
67
|
+
sources: "Woher das stammt"
|
|
68
|
+
last_reviewed: "Zuletzt geprüft"
|
|
69
|
+
contradictions: "Was widerspricht"
|
|
70
|
+
bands:
|
|
71
|
+
confidence:
|
|
72
|
+
very_high: 0.85
|
|
73
|
+
high: 0.65
|
|
74
|
+
medium: 0.40
|
|
75
|
+
freshness:
|
|
76
|
+
fresh: 0.80
|
|
77
|
+
aging: 0.50
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Labels stay in `.md` source English (per `language-and-tone`);
|
|
81
|
+
**glossary YAMLs are the exception** — they hold the localized
|
|
82
|
+
runtime strings for the rendered surface and may be in the role's
|
|
83
|
+
native language. Loader validates `schema:` matches `explain-glossary/v0`.
|
|
84
|
+
|
|
85
|
+
## `/why` quick command
|
|
86
|
+
|
|
87
|
+
Any role may invoke `/why` on the last agent reply. Resolution:
|
|
88
|
+
|
|
89
|
+
1. Look up the last `host.turn` in the active session JSONL.
|
|
90
|
+
2. Extract memory entry IDs referenced in the reply (regex on
|
|
91
|
+
`mem://<id>` markers the host envelope already emits).
|
|
92
|
+
3. Call `memory_explain` for each id; merge envelopes.
|
|
93
|
+
4. Render in the active mode (plain by default, technical if the
|
|
94
|
+
role's `explain_default` is `technical`).
|
|
95
|
+
5. Append the rendered output to the session JSONL as
|
|
96
|
+
`{ kind: "explain.rendered", data: { mode, ids: [...] } }`.
|
|
97
|
+
|
|
98
|
+
`/why` never makes a network call beyond the existing MCP transport.
|
|
99
|
+
|
|
100
|
+
## Renderer surface (pure function)
|
|
101
|
+
|
|
102
|
+
```ts
|
|
103
|
+
function renderExplain(
|
|
104
|
+
envelope: ExplainV1,
|
|
105
|
+
options: {
|
|
106
|
+
mode: "technical" | "plain",
|
|
107
|
+
glossary?: ExplainGlossaryV0,
|
|
108
|
+
locale?: string, // affects relative-date rendering only
|
|
109
|
+
}
|
|
110
|
+
): { markdown: string, mode: string, ids: string[] }
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Implementation lives in `packages/core/src/workspace/explain/`. No I/O,
|
|
114
|
+
no clock dependency beyond the `now` injected for relative-date
|
|
115
|
+
formatting; testable with fixtures.
|
|
116
|
+
|
|
117
|
+
## Coverage (Phase 6 Step 5)
|
|
118
|
+
|
|
119
|
+
Fixture-driven golden tests against `tests/golden/explain/` for ≥ 5
|
|
120
|
+
envelope shapes:
|
|
121
|
+
|
|
122
|
+
1. High-trust validated entry — fresh, no contradictions.
|
|
123
|
+
2. Low-trust quarantined entry — never promoted.
|
|
124
|
+
3. Contradicted entry — 2 open contradictions, one resolved.
|
|
125
|
+
4. Recently promoted entry — last `promotion_history[0]` < 24h old.
|
|
126
|
+
5. Deprecated entry — superseded-by chain, decay factor 0.20.
|
|
127
|
+
|
|
128
|
+
Each fixture exercised in both `technical` and `plain` modes plus
|
|
129
|
+
one with a glossary override. ≥ 90 % branch on the renderer module.
|
|
130
|
+
|
|
131
|
+
## Failure modes
|
|
132
|
+
|
|
133
|
+
- Missing envelope field → render placeholder "(unavailable)" in
|
|
134
|
+
plain mode; renderer never throws. Technical mode shows the raw
|
|
135
|
+
null with a warning marker.
|
|
136
|
+
- Unknown `schema:` in glossary → loader logs a warning and falls
|
|
137
|
+
back to defaults; never blocks rendering.
|
|
138
|
+
- `/why` finds no `mem://` markers → renders "This reply did not
|
|
139
|
+
cite any stored memory entries." No error.
|
|
140
|
+
|
|
141
|
+
## Cross-references
|
|
142
|
+
|
|
143
|
+
- ADR: [`ADR-026`](../decisions/ADR-026-explain-mode-translation.md).
|
|
144
|
+
- Envelope contract: [`agent-memory-contract`](agent-memory-contract.md) (`explain-v1`).
|
|
145
|
+
- Workspace integration: [`daily-workspace`](daily-workspace.md) (right rail).
|
|
146
|
+
- Roles: [`role-experience`](role-experience.md) (`explain_default` field).
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
# Host-Agent Protocol Contract
|
|
2
|
+
|
|
3
|
+
> **Status** · v0 / inventory · 2026-05-24. The daily workspace shells out to
|
|
4
|
+
> a host agent for every model interaction; it never re-implements one. This
|
|
5
|
+
> contract names which surfaces each host agent exposes today, where the
|
|
6
|
+
> workspace can rely on them, and what the fallback is when a surface is
|
|
7
|
+
> missing. Governs ADR-023 / ADR-024 / ADR-025 — see
|
|
8
|
+
> [`ADR-022`](../decisions/ADR-022-daily-workspace-decomposition.md).
|
|
9
|
+
|
|
10
|
+
## Required capabilities
|
|
11
|
+
|
|
12
|
+
The workspace v0 requires exactly two surfaces from a host agent:
|
|
13
|
+
|
|
14
|
+
1. **`launch(prompt, skill?, cwd)`** — start a new conversation in the host
|
|
15
|
+
agent with `prompt` pre-filled and (optionally) `skill` pre-selected, in
|
|
16
|
+
the named working directory. Must be invocable from a non-interactive
|
|
17
|
+
shell. Return shape: success / failure; the conversation runs inside the
|
|
18
|
+
host's own UI from there.
|
|
19
|
+
2. **`emit_trace(session) → ndjson`** — append-only, structured event stream
|
|
20
|
+
for the running conversation: model id, tool calls, citations,
|
|
21
|
+
explain-trace envelope (per
|
|
22
|
+
[`memory-explain-v1`](memory-explain-v1.md) when memory is involved).
|
|
23
|
+
Must be readable by tail-style consumers without polling the host's UI.
|
|
24
|
+
|
|
25
|
+
Both surfaces must be **stable** — documented by the vendor, covered by
|
|
26
|
+
their semver, not derived from unstable stdout parsing.
|
|
27
|
+
|
|
28
|
+
## Today's inventory (2026-05-24)
|
|
29
|
+
|
|
30
|
+
| Host agent | `launch` surface | `emit_trace` surface | Effective tier |
|
|
31
|
+
|---|---|---|---|
|
|
32
|
+
| **Claude Code (CLI)** | `claude -p "<prompt>" --output-format json` (subprocess; documented). Slash commands resolved against `.claude/commands/`. | JSON envelope on stdout per turn; session id preserved; no live append stream. | **Tier 1** — only host with both surfaces today. |
|
|
33
|
+
| **OpenAI Codex CLI** | `codex exec --json` consumes stdin; documented. No slash-command surface (skills not first-class). | NDJSON event stream on stdout — `turn.completed`, `item.completed`, tool envelopes. | **Tier 1**, no skill surface — workspace must pre-render the prompt with skill context inlined. |
|
|
34
|
+
| **Gemini CLI** | `gemini --output-format json` consumes stdin; documented. | JSON envelope on stdout per turn. OAuth grant required once. | **Tier 1**, no skill surface (same as Codex). |
|
|
35
|
+
| **Augment (IDE)** | None documented. Hook trampolines exist (`scripts/hooks/augment-dispatcher.sh`) — post-event only, cannot initiate a conversation. | None — hook payloads cover events, not model output. | **Tier 3** — observe-only. |
|
|
36
|
+
| **Cursor (IDE)** | `cursor://` deep links open files / chats but cannot pre-fill a prompt with skill context from a non-Cursor process. Hooks (`.cursor/hooks.json`) are post-event. | None at the protocol layer. | **Tier 3** — observe-only. |
|
|
37
|
+
| **Cline (VS Code ext)** | None. Hooks (`~/Documents/Cline/Hooks/`) are post-event. | None at the protocol layer. | **Tier 3** — observe-only. |
|
|
38
|
+
| **Windsurf (Cascade)** | None. Hooks (`.windsurf/hooks.json`) are post-event. | None at the protocol layer. | **Tier 3** — observe-only. |
|
|
39
|
+
|
|
40
|
+
## Tier definitions
|
|
41
|
+
|
|
42
|
+
- **Tier 1 — first-class.** Both `launch` and `emit_trace` are stable.
|
|
43
|
+
Workspace can build full features against the host.
|
|
44
|
+
- **Tier 2 — degraded.** One of the two surfaces exists; workspace can
|
|
45
|
+
partially drive but degrades a named feature (e.g. no inline citations).
|
|
46
|
+
*(No host agent occupies this tier today.)*
|
|
47
|
+
- **Tier 3 — observe-only.** Neither surface exists at the agent boundary.
|
|
48
|
+
The workspace falls back to (a) user-paste of a generated prompt, or (b)
|
|
49
|
+
inbox-file handoff (writes `~/.event4u/agent-config/workspace/inbox/<id>.md`,
|
|
50
|
+
user opens the host themselves). Hook trampolines remain available for
|
|
51
|
+
passive event recording but do not initiate conversations.
|
|
52
|
+
|
|
53
|
+
## v0 scope
|
|
54
|
+
|
|
55
|
+
- The workspace v0 ships against **Claude Code** as the single Tier-1 host.
|
|
56
|
+
Codex and Gemini are wired but secondary (no skill surface — see ADR-024).
|
|
57
|
+
- Tier-3 hosts get the **inbox handoff** fallback only: workspace writes the
|
|
58
|
+
rendered prompt + skill body into the inbox file and surfaces a one-line
|
|
59
|
+
copy-to-clipboard banner. No tighter integration is attempted in v0.
|
|
60
|
+
- The CLI shell-out is the **only** mechanism. No HTTP RPC, no MCP-driven
|
|
61
|
+
agent control, no shared SQLite — those are deferred to v1+ when at least
|
|
62
|
+
one Tier-3 host moves up.
|
|
63
|
+
|
|
64
|
+
## Stability & change policy
|
|
65
|
+
|
|
66
|
+
- The vendor-published JSON envelope shapes are the contract. Workspace
|
|
67
|
+
parses by named keys, never by positional fields.
|
|
68
|
+
- A new host-agent CLI release that breaks the envelope **fails closed** —
|
|
69
|
+
the workspace surfaces a banner and degrades to Tier 3 (inbox handoff)
|
|
70
|
+
until this contract is updated.
|
|
71
|
+
- This file is the source of truth for host-agent tier. Adding a host or
|
|
72
|
+
promoting a tier requires (a) a vendor-link in the inventory row,
|
|
73
|
+
(b) at least one integration test under
|
|
74
|
+
`tests/integration/host-agent-protocol/`.
|
|
75
|
+
|
|
76
|
+
## Cross-references
|
|
77
|
+
|
|
78
|
+
- ADR: [`ADR-022`](../decisions/ADR-022-daily-workspace-decomposition.md) ·
|
|
79
|
+
[`ADR-023`](../decisions/ADR-023-host-agent-protocol.md) ·
|
|
80
|
+
[`ADR-024`](../decisions/ADR-024-workspace-v0-feature-floor.md) ·
|
|
81
|
+
[`ADR-025`](../decisions/ADR-025-workspace-chrome.md).
|
|
82
|
+
- Skill: [`ai-council`](../../.agent-src/skills/ai-council/SKILL.md) — uses
|
|
83
|
+
the same CLI subprocess shape (claude / codex / gemini) for council
|
|
84
|
+
members; the workspace inherits the proven invocation paths.
|
|
85
|
+
- Hooks: [`hook-architecture-v1`](hook-architecture-v1.md) — covers the
|
|
86
|
+
post-event surface for all hosts including Tier-3.
|
|
87
|
+
- Daily workspace surface: [`daily-workspace`](daily-workspace.md) — UI
|
|
88
|
+
contract that consumes this protocol.
|
|
@@ -0,0 +1,148 @@
|
|
|
1
|
+
# Local Analytics Contract
|
|
2
|
+
|
|
3
|
+
> **Status** · v0 / design · 2026-05-24. Phase 7 of the
|
|
4
|
+
> employee-product workstream.
|
|
5
|
+
> **Local-only.** Does NOT lift the Hard-Floor item from 3.1.0 — no
|
|
6
|
+
> network egress, no remote Worker, no POST. Inertia of the prior
|
|
7
|
+
> telemetry roadmap is preserved.
|
|
8
|
+
|
|
9
|
+
## Position vs the 3.1.0 telemetry SDK
|
|
10
|
+
|
|
11
|
+
3.1.0 shipped the telemetry SDK + Cloudflare Worker as **source-only**;
|
|
12
|
+
the kill-switch defaults off and nothing is deployed. Phase 7 builds
|
|
13
|
+
a **separate local-only** analytics path:
|
|
14
|
+
|
|
15
|
+
| Surface | Lives | Egress | Default |
|
|
16
|
+
|---|---|---|---|
|
|
17
|
+
| 3.1.0 remote telemetry | Worker (undeployed) | ✗ inert | off, Hard-Floor |
|
|
18
|
+
| **Phase 7 local analytics** | `~/.event4u/agent-config/workspace/analytics/` | ✗ never | **on** for local-only |
|
|
19
|
+
|
|
20
|
+
The two surfaces share **event vocabulary** where it overlaps; they
|
|
21
|
+
never share a transport. Local analytics writes to disk; remote
|
|
22
|
+
telemetry remains undeployed.
|
|
23
|
+
|
|
24
|
+
## Event vocabulary
|
|
25
|
+
|
|
26
|
+
Re-uses the `install_stage` schema (3.1.0) where applicable, and
|
|
27
|
+
adds the `workspace_event` schema for launcher / document / explain
|
|
28
|
+
interactions:
|
|
29
|
+
|
|
30
|
+
| schema | source | example fields |
|
|
31
|
+
|---|---|---|
|
|
32
|
+
| `install_stage/v1` | installer (3.1.0) | `stage`, `outcome`, `duration_ms`, `package_version` |
|
|
33
|
+
| `workspace_event/v0` | Phase 4–6 workspace | `event`, `role`, `task`, `host_tier`, `duration_ms` |
|
|
34
|
+
|
|
35
|
+
`workspace_event/v0` event names (closed set):
|
|
36
|
+
|
|
37
|
+
- `launcher.opened` · `launcher.task_picked` · `launcher.task_launched`
|
|
38
|
+
- `session.started` · `session.host_turn` · `session.completed`
|
|
39
|
+
- `document.created` · `document.edited` · `document.exported`
|
|
40
|
+
- `explain.opened` · `explain.mode_toggled` · `why.invoked`
|
|
41
|
+
- `knowledge.queried` · `knowledge.source_clicked`
|
|
42
|
+
|
|
43
|
+
No prompt bodies. No response bodies. No PII. Only counters, role
|
|
44
|
+
labels, task slugs (already public), and durations.
|
|
45
|
+
|
|
46
|
+
## Storage
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
~/.event4u/agent-config/workspace/analytics/
|
|
50
|
+
├── events.jsonl ← append-only event log
|
|
51
|
+
└── retention.lock ← prune-pass mutex
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
One JSON record per line:
|
|
55
|
+
|
|
56
|
+
```json
|
|
57
|
+
{
|
|
58
|
+
"ts": "2026-05-24T12:08:00Z",
|
|
59
|
+
"schema": "workspace_event/v0",
|
|
60
|
+
"event": "launcher.task_launched",
|
|
61
|
+
"data": { "role": "galabau", "task": "angebot-erstellen",
|
|
62
|
+
"host_tier": 1, "duration_ms": 420 }
|
|
63
|
+
}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Rolling retention: **90 days local**. A prune pass on workspace
|
|
67
|
+
launch trims records older than 90 days; the lockfile prevents
|
|
68
|
+
concurrent prune (cheap fs lock, not a real mutex).
|
|
69
|
+
|
|
70
|
+
## Opt-out
|
|
71
|
+
|
|
72
|
+
Single env var, single config flag, both checked:
|
|
73
|
+
|
|
74
|
+
| Surface | Default | Override |
|
|
75
|
+
|---|---|---|
|
|
76
|
+
| Env | `AGENT_CONFIG_NO_LOCAL_ANALYTICS` unset | set to any non-empty value → no writes |
|
|
77
|
+
| Config | `.agent-settings.yml` → `analytics.local: on` | set to `off` → no writes |
|
|
78
|
+
|
|
79
|
+
Either set to off → emitter short-circuits before opening the file.
|
|
80
|
+
No retention pruning either; the existing log stays until the user
|
|
81
|
+
removes it.
|
|
82
|
+
|
|
83
|
+
## Emitter API
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
# packages/core/src/workspace/analytics/emitter.py
|
|
87
|
+
class LocalAnalytics:
|
|
88
|
+
def emit(self, event: str, data: dict) -> None: ...
|
|
89
|
+
def query(self, since: datetime, event: str | None = None) -> list[Event]: ...
|
|
90
|
+
def prune(self) -> int: ... # returns number of records dropped
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
The emitter is a synchronous append-line write. Never blocks the UI
|
|
94
|
+
thread above 5 ms (90th percentile); no async / queue / batch
|
|
95
|
+
machinery in v0.
|
|
96
|
+
|
|
97
|
+
## `/analytics:show` command
|
|
98
|
+
|
|
99
|
+
Local-only query. Renders to ASCII / Markdown table; never POSTs.
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
$ npx @event4u/agent-config analytics:show --window 30d
|
|
103
|
+
|
|
104
|
+
Top prompts (last 30 days)
|
|
105
|
+
galabau · angebot-erstellen 47
|
|
106
|
+
content-creator · video-from-script 31
|
|
107
|
+
consultant · meeting-memo 24
|
|
108
|
+
|
|
109
|
+
Launcher → completion rate per role
|
|
110
|
+
galabau 87% (47 launched · 41 completed)
|
|
111
|
+
content-creator 71% (31 launched · 22 completed)
|
|
112
|
+
consultant 92% (24 launched · 22 completed)
|
|
113
|
+
|
|
114
|
+
Average session length: 4m 12s
|
|
115
|
+
Knowledge sources clicked: 18 (handbuch.pdf · offer-template.md · …)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Flags: `--window <30d|7d|24h>` · `--event <name>` · `--role <slug>` ·
|
|
119
|
+
`--format <markdown|csv|json>`. No `--upload`, no `--share`; the
|
|
120
|
+
command can only read and render.
|
|
121
|
+
|
|
122
|
+
## Coverage (Phase 7 Step 4)
|
|
123
|
+
|
|
124
|
+
- pytest against fixture JSONL stores (`tests/fixtures/local-analytics/`):
|
|
125
|
+
emitter writes, query filters by window + event + role, prune
|
|
126
|
+
drops correctly at the 90-day boundary.
|
|
127
|
+
- Env-flag short-circuit: emitter is a no-op when
|
|
128
|
+
`AGENT_CONFIG_NO_LOCAL_ANALYTICS=1`; no file is created.
|
|
129
|
+
- Concurrency: two emitters writing the same file produce
|
|
130
|
+
well-formed lines (POSIX `O_APPEND` semantics — test on Linux,
|
|
131
|
+
document Windows caveat).
|
|
132
|
+
|
|
133
|
+
## Failure modes
|
|
134
|
+
|
|
135
|
+
- Disk full → emitter logs warning to stderr, drops the event, never
|
|
136
|
+
raises. UI thread is unaffected.
|
|
137
|
+
- Malformed line in `events.jsonl` → query skips the line, increments
|
|
138
|
+
a `malformed_lines` counter exposed via `/analytics:show --health`.
|
|
139
|
+
- Schema bump (`workspace_event/v0` → `v1`) → emitter writes the new
|
|
140
|
+
schema; query reads both. Migration is forward-compatible.
|
|
141
|
+
|
|
142
|
+
## Cross-references
|
|
143
|
+
|
|
144
|
+
- Phase 4 shell that produces the events: [`daily-workspace`](daily-workspace.md).
|
|
145
|
+
- Phase 5 document events: [`workspace-documents`](workspace-documents.md).
|
|
146
|
+
- Phase 6 explain events: [`explain-modes`](explain-modes.md).
|
|
147
|
+
- 3.1.0 telemetry inertia: archived `road-to-product-adoption.md` Phase 4.
|
|
148
|
+
- Walkthrough doc (Phase 7 Step 5): `docs/guides/local-analytics.md` (deferred).
|