pluribus-context 0.3.33 → 0.3.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +19 -0
- package/README.md +7 -6
- package/docs/ai-pr-review-receipts.md +153 -0
- package/docs/canonical-output-receipts.md +107 -0
- package/docs/community-review-packet.md +11 -11
- package/docs/context-budget-receipts.md +22 -0
- package/docs/context-input-evidence.md +15 -0
- package/docs/dynamic-workflow-run-receipts.md +158 -0
- package/docs/install-plan-receipts.md +77 -0
- package/docs/mcp-tool-visibility-receipts.md +67 -0
- package/docs/review-primitive-gate.md +107 -0
- package/docs/skill-policy-receipts.md +87 -0
- package/docs/subagent-role-receipts.md +95 -0
- package/docs/temporal-context-receipts.md +123 -0
- package/examples/agent-skills/context-receipts/SKILL.md +21 -0
- package/examples/agent-skills/skill-policy-receipts/README.md +22 -0
- package/examples/agent-skills/skill-policy-receipts/SKILL.md +77 -0
- package/examples/ai-pr-review-receipts/.github/pull_request_template.md +31 -0
- package/examples/ai-pr-review-receipts/README.md +5 -0
- package/examples/canonical-output-receipts/canonical-output-receipt.json +55 -0
- package/examples/claude-code-review-hook/README.md +74 -0
- package/examples/claude-code-review-hook/check-review-receipt-hook.mjs +80 -0
- package/examples/claude-code-review-hook/sample-task-completed-event.json +6 -0
- package/examples/context-input-evidence/code-search-retrieval-otel-trace.json +879 -0
- package/examples/context-input-evidence/code-search-retrieval-receipt.ndjson +8 -0
- package/examples/context-input-evidence/convert-code-search-retrieval-log.mjs +280 -0
- package/examples/context-input-evidence/sample-code-search-retrieval-log.jsonl +5 -0
- package/examples/dynamic-workflow-run-receipts/README.md +18 -0
- package/examples/dynamic-workflow-run-receipts/workflow-run-receipt.json +112 -0
- package/examples/install-plan-receipts/README.md +34 -0
- package/examples/install-plan-receipts/agent-install-plan-receipt.json +56 -0
- package/examples/review-primitive-gate/README.md +19 -0
- package/examples/review-primitive-gate/check-review-receipt.mjs +100 -0
- package/examples/review-primitive-gate/fail-review-receipt.json +42 -0
- package/examples/review-primitive-gate/pass-review-receipt.json +54 -0
- package/examples/subagent-role-receipts/README.md +15 -0
- package/examples/subagent-role-receipts/agents.toml +36 -0
- package/examples/temporal-context-receipts/CURRENT_STATE.md +13 -0
- package/examples/temporal-context-receipts/specs/2025-checkout-rewrite.md +10 -0
- package/examples/temporal-context-receipts/specs/2026-checkout-risk-notes.md +10 -0
- package/examples/temporal-context-receipts/temporal-authority-receipt.json +27 -0
- package/package.json +1 -1
- package/src/utils/version.js +1 -1
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Install-plan receipts
|
|
2
|
+
|
|
3
|
+
Use this when an MCP server, Skill bundle, plugin, starter kit, or setup script says it can configure many AI coding tools for you.
|
|
4
|
+
|
|
5
|
+
The risk is not only whether a hook later runs safely. The earlier boundary is the installer itself: it may detect agents, write MCP config, add instruction files, install Skills, register hooks, or create backups before the user understands what changed.
|
|
6
|
+
|
|
7
|
+
The goal is a tiny, privacy-safe pre-mutation receipt that proves what the setup step intends to touch **before the first write starts**. Do not log prompts, source code, secrets, raw environment dumps, transcripts, raw command output, customer data, or private absolute paths.
|
|
8
|
+
|
|
9
|
+
## Boundary to prove
|
|
10
|
+
|
|
11
|
+
For every setup/install run, capture a plan like this before applying changes:
|
|
12
|
+
|
|
13
|
+
```json
|
|
14
|
+
{
|
|
15
|
+
"receipt_type": "agent.install.plan.v1",
|
|
16
|
+
"run_id": "local-install-2026-05-29T16:00Z",
|
|
17
|
+
"installer": "code-memory-mcp",
|
|
18
|
+
"mode_requested": "plan",
|
|
19
|
+
"mode_effective": "plan",
|
|
20
|
+
"agents_detected": ["claude-code", "cursor", "codex", "openclaw"],
|
|
21
|
+
"agents_selected": ["claude-code", "openclaw"],
|
|
22
|
+
"planned_writes": [
|
|
23
|
+
{
|
|
24
|
+
"kind": "mcp_config",
|
|
25
|
+
"target": "claude-code project config",
|
|
26
|
+
"operation": "add_server",
|
|
27
|
+
"backup_planned": true
|
|
28
|
+
},
|
|
29
|
+
{
|
|
30
|
+
"kind": "instruction_file",
|
|
31
|
+
"target": "AGENTS.md",
|
|
32
|
+
"operation": "append_usage_notes",
|
|
33
|
+
"backup_planned": true
|
|
34
|
+
},
|
|
35
|
+
{
|
|
36
|
+
"kind": "hook",
|
|
37
|
+
"target": "pre-tool hook config",
|
|
38
|
+
"operation": "register_command",
|
|
39
|
+
"backup_planned": true
|
|
40
|
+
}
|
|
41
|
+
],
|
|
42
|
+
"external_commands_planned": [
|
|
43
|
+
{ "phase": "apply", "command_class": "package_manager_install" }
|
|
44
|
+
],
|
|
45
|
+
"network_after_install": "mcp_server_localhost_only",
|
|
46
|
+
"writes_started": false,
|
|
47
|
+
"next_safe_command": "installer apply --from-plan install-plan.json"
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Keep `target` values coarse enough for review. Prefer `claude-code project config` over a full local path, and `package_manager_install` over raw shell output.
|
|
52
|
+
|
|
53
|
+
## Acceptance checks
|
|
54
|
+
|
|
55
|
+
A safe installer should make these claims inspectable:
|
|
56
|
+
|
|
57
|
+
1. **Plan mode exists** — `install --plan`, `install --dry-run`, or equivalent emits the receipt without writing files.
|
|
58
|
+
2. **Effective mode is explicit** — if the user requested `apply` but policy downgraded to `plan`, the receipt says so.
|
|
59
|
+
3. **Agent detection is separated from selection** — finding Cursor/Codex/Claude/OpenClaw does not imply every detected tool will be changed.
|
|
60
|
+
4. **Every planned write has a kind and backup decision** — config, instruction file, Skill, hook, shell profile, lockfile, cache, or generated artifact.
|
|
61
|
+
5. **Writes are still false at receipt time** — `writes_started=false` is the key trust boundary.
|
|
62
|
+
6. **Apply can be repeated from the plan** — the user can review one artifact, then run a concrete next command.
|
|
63
|
+
7. **No private payloads leak** — no raw source, prompts, env dumps, secrets, token values, transcripts, stack traces, or raw tool output.
|
|
64
|
+
|
|
65
|
+
## Why this matters for hooks and MCP
|
|
66
|
+
|
|
67
|
+
Hooks, Skills, and MCP configs are often discussed as runtime supply-chain surfaces. That is true, but it is downstream. A one-command installer can create the hook or MCP entry first.
|
|
68
|
+
|
|
69
|
+
A hook receipt answers: “what executed?”
|
|
70
|
+
|
|
71
|
+
An install-plan receipt answers the earlier question: **“what is about to be installed, written, and trusted?”**
|
|
72
|
+
|
|
73
|
+
If an installer cannot answer that before mutation, treat it like running CI from an untrusted fork: useful, but not automatically safe.
|
|
74
|
+
|
|
75
|
+
## Try the copyable example
|
|
76
|
+
|
|
77
|
+
See [`examples/install-plan-receipts/`](../examples/install-plan-receipts/) for a small review checklist and sample receipt you can copy into setup scripts, README install sections, or agent-managed onboarding workflows.
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# MCP tool visibility receipts
|
|
2
|
+
|
|
3
|
+
MCP memory, Git, GitLab, code-search, and knowledge-graph servers can be healthy while the agent still cannot see their tools.
|
|
4
|
+
|
|
5
|
+
A useful debug artifact should prove each boundary separately:
|
|
6
|
+
|
|
7
|
+
1. **Server launched** — the configured command starts without leaking env/secrets.
|
|
8
|
+
2. **Handshake completed** — client and server agreed on a protocol version and capabilities.
|
|
9
|
+
3. **Proxy catalog returned** — a direct `tools/list` call returns the expected tool count and names.
|
|
10
|
+
4. **Client catalog visible** — the actual agent UI/runtime exposes the same tools under the expected names.
|
|
11
|
+
5. **Invocation allowed or refused** — the first tool call either runs, or returns an explicit permission/config/schema reason.
|
|
12
|
+
|
|
13
|
+
`server healthy` is not enough. `tools/list` is not enough. The receipt needs to say where the chain stopped.
|
|
14
|
+
|
|
15
|
+
## 60-second probe for any stdio MCP server
|
|
16
|
+
|
|
17
|
+
Replace the command after the pipe with the server command you already configured in Claude Code, Cursor, Codex, OpenClaw, or another MCP client.
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
(
|
|
21
|
+
printf '%s\n' '{"jsonrpc":"2.0","method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"receipt-probe","version":"0.1.0"}},"id":1}'
|
|
22
|
+
printf '%s\n' '{"jsonrpc":"2.0","method":"tools/list","params":{},"id":2}'
|
|
23
|
+
) | your-mcp-server-command
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Record only metadata, not raw prompt/source/tool output:
|
|
27
|
+
|
|
28
|
+
```json
|
|
29
|
+
{
|
|
30
|
+
"kind": "mcp.tool_visibility.receipt",
|
|
31
|
+
"server": "gitlab",
|
|
32
|
+
"server_command_hash": "sha256:...",
|
|
33
|
+
"protocol_version_requested": "2024-11-05",
|
|
34
|
+
"handshake": "ok",
|
|
35
|
+
"proxy_tools_count": 172,
|
|
36
|
+
"proxy_tool_names_sample": ["glab_issue_list", "glab_mr_view"],
|
|
37
|
+
"client": "Claude Code",
|
|
38
|
+
"client_catalog_visible": false,
|
|
39
|
+
"client_tools_count": 0,
|
|
40
|
+
"stopped_at": "client_catalog_visible",
|
|
41
|
+
"privacy": "names/counts only; no args, outputs, tokens, paths, or source snippets"
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Acceptance check
|
|
46
|
+
|
|
47
|
+
For a release or bug report, ask for one small matrix:
|
|
48
|
+
|
|
49
|
+
| Boundary | Evidence | Pass condition |
|
|
50
|
+
| --- | --- | --- |
|
|
51
|
+
| Launch | server command hash + exit/live status | command starts and stays alive long enough for handshake |
|
|
52
|
+
| Handshake | protocol version + capabilities summary | initialized without version/schema mismatch |
|
|
53
|
+
| Proxy catalog | `tools/list` count + stable tool-name sample | expected tools returned directly |
|
|
54
|
+
| Client catalog | client-visible count + naming prefix | same class of tools visible to the agent |
|
|
55
|
+
| First invocation | allowed/refused reason | failure explains permission/config/schema, not silent absence |
|
|
56
|
+
|
|
57
|
+
This shape is intentionally compatible with GitHub/GitLab issue reports and OpenTelemetry-style events. It helps maintainers separate server bugs from client catalog, protocol-version, schema, timeout, and permission bugs without asking users to paste private output.
|
|
58
|
+
|
|
59
|
+
## Why this belongs near Pluribus
|
|
60
|
+
|
|
61
|
+
Pluribus should not become an MCP gateway or memory database. The narrow value is evidence for context boundaries:
|
|
62
|
+
|
|
63
|
+
- generated instruction files prove what static rules were written;
|
|
64
|
+
- memory/search receipts prove what retrieved context was delivered;
|
|
65
|
+
- tool visibility receipts prove whether a configured MCP capability actually crossed into the agent's usable catalog.
|
|
66
|
+
|
|
67
|
+
If a tool is not visible to the agent, the project has no reliable context handoff no matter how healthy the server looks.
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
# Review primitive gate for agent handoffs
|
|
2
|
+
|
|
3
|
+
Use this when a parallel-agent run, Claude Code hook/workflow, Codex/OpenClaw handoff, or local control-plane wrapper needs to prove more than "the agent said it was done".
|
|
4
|
+
|
|
5
|
+
The market question is not just what to log after a run. It is whether a reviewer or CI job can make a decision:
|
|
6
|
+
|
|
7
|
+
- **continue** because the assignment stayed inside approved scope and required checks passed;
|
|
8
|
+
- **review first** because the run is partial or has explicit unverified assumptions;
|
|
9
|
+
- **reject / stop** because scope changed without approval, required checks were skipped or failed, or the run is unsafe to resume.
|
|
10
|
+
|
|
11
|
+
Pluribus should not be the execution control plane. Worktrees, VMs, hooks, masks, and vendor guardrails can enforce parts of the run. The useful Pluribus layer is a small, privacy-safe receipt that turns those controls into reviewable evidence across tools.
|
|
12
|
+
|
|
13
|
+
## Receipt shape
|
|
14
|
+
|
|
15
|
+
Attach this receipt to a PR body, CI artifact, run summary, or handoff packet.
|
|
16
|
+
|
|
17
|
+
```json
|
|
18
|
+
{
|
|
19
|
+
"type": "agent.review_primitive_receipt.v1",
|
|
20
|
+
"assignment_id": "agent-auth-audit-42",
|
|
21
|
+
"run_id": "run-2026-05-31T17-00Z",
|
|
22
|
+
"agent": {
|
|
23
|
+
"tool": "claude-code",
|
|
24
|
+
"role": "auth-reviewer"
|
|
25
|
+
},
|
|
26
|
+
"approved_boundaries": {
|
|
27
|
+
"read": ["src/auth/**", "tests/auth/**"],
|
|
28
|
+
"write": ["tests/auth/**"],
|
|
29
|
+
"network": false
|
|
30
|
+
},
|
|
31
|
+
"scope_access_changes": [
|
|
32
|
+
{
|
|
33
|
+
"change": "read docs/security/**",
|
|
34
|
+
"reason": "needed policy wording for test fixture",
|
|
35
|
+
"approved": true,
|
|
36
|
+
"approved_by": "human-reviewer"
|
|
37
|
+
}
|
|
38
|
+
],
|
|
39
|
+
"commands_and_checks": [
|
|
40
|
+
{
|
|
41
|
+
"name": "npm test -- tests/auth",
|
|
42
|
+
"kind": "required_test",
|
|
43
|
+
"status": "passed",
|
|
44
|
+
"evidence": "ci://job/123#auth-tests"
|
|
45
|
+
},
|
|
46
|
+
{
|
|
47
|
+
"name": "npm run lint",
|
|
48
|
+
"kind": "required_check",
|
|
49
|
+
"status": "passed",
|
|
50
|
+
"evidence": "ci://job/123#lint"
|
|
51
|
+
}
|
|
52
|
+
],
|
|
53
|
+
"refused_operations": [
|
|
54
|
+
{
|
|
55
|
+
"operation": "write src/auth/session.ts",
|
|
56
|
+
"reason": "outside approved write boundary"
|
|
57
|
+
}
|
|
58
|
+
],
|
|
59
|
+
"handoff": {
|
|
60
|
+
"changed_files_bucket": "under_5",
|
|
61
|
+
"evidence_path": "artifacts/agent-auth-audit-42.json",
|
|
62
|
+
"next_safe_action": "review tests/auth/session.test.ts before merge"
|
|
63
|
+
},
|
|
64
|
+
"resume_state": "complete",
|
|
65
|
+
"privacy": {
|
|
66
|
+
"raw_prompts_logged": false,
|
|
67
|
+
"raw_tool_output_logged": false,
|
|
68
|
+
"source_code_logged": false,
|
|
69
|
+
"secrets_logged": false
|
|
70
|
+
}
|
|
71
|
+
}
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## Minimal gate
|
|
75
|
+
|
|
76
|
+
The copyable demo in [`examples/review-primitive-gate/`](../examples/review-primitive-gate/) turns the receipt into a CI/reviewer decision.
|
|
77
|
+
|
|
78
|
+
If you use Claude Code hooks, the [`examples/claude-code-review-hook/`](../examples/claude-code-review-hook/) bridge shows how to run the same gate from `TaskCompleted`, `PostCompact`, or `SessionEnd` without logging raw prompts, transcripts, tool output, source code, or secrets.
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
node examples/review-primitive-gate/check-review-receipt.mjs \
|
|
82
|
+
examples/review-primitive-gate/pass-review-receipt.json
|
|
83
|
+
|
|
84
|
+
node examples/review-primitive-gate/check-review-receipt.mjs \
|
|
85
|
+
examples/review-primitive-gate/fail-review-receipt.json
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
The gate passes only when:
|
|
89
|
+
|
|
90
|
+
- `type` is `agent.review_primitive_receipt.v1`;
|
|
91
|
+
- `assignment_id` and `run_id` exist;
|
|
92
|
+
- approved read/write boundaries are present;
|
|
93
|
+
- every scope/access change is explicitly approved;
|
|
94
|
+
- every required check/test passed;
|
|
95
|
+
- `resume_state` is `complete`.
|
|
96
|
+
|
|
97
|
+
The gate fails when a run is `partial` or `unsafe-to-resume`, when a required check is skipped/failed, or when scope changed without approval. That is intentional: partial work can be valuable, but it should not silently pass a merge gate.
|
|
98
|
+
|
|
99
|
+
## What to keep out
|
|
100
|
+
|
|
101
|
+
Do not put raw prompts, full transcripts, source code, exact proprietary paths, secrets, customer data, or raw tool output in the receipt. Use coarse globs, hashes, CI URLs, artifact IDs, pass/fail states, and human-readable next safe actions.
|
|
102
|
+
|
|
103
|
+
## Why this is different from a receipt field list
|
|
104
|
+
|
|
105
|
+
A field list says what happened. A review primitive says what the next system is allowed to do with that evidence.
|
|
106
|
+
|
|
107
|
+
If the artifact cannot reject a PR, pause a handoff, or force review when the run became partial/unsafe, it is probably just a nicer `plan.md`.
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# Skill policy receipts
|
|
2
|
+
|
|
3
|
+
Use this when an Agent Skill, `CLAUDE.md`, hook, or project rule says "do not touch X" but the agent can still drift into the forbidden path.
|
|
4
|
+
|
|
5
|
+
The goal is not to log prompts or source code. The goal is a tiny, privacy-safe receipt that proves the run checked the policy boundary before writing code and again after writing code.
|
|
6
|
+
|
|
7
|
+
This was prompted by a live `r/ClaudeCode` thread where a Skill told Claude Code not to create unit tests for internal services, but the run still generated one. Natural-language policy alone was too soft; the missing piece was an inspectable guard.
|
|
8
|
+
|
|
9
|
+
## Boundary to prove
|
|
10
|
+
|
|
11
|
+
For every requested change, capture:
|
|
12
|
+
|
|
13
|
+
```json
|
|
14
|
+
{
|
|
15
|
+
"receipt_type": "skill.policy.v1",
|
|
16
|
+
"skill": "unit-test-boundary",
|
|
17
|
+
"request_id": "local-run-2026-05-28T12:00Z",
|
|
18
|
+
"policy_scope": "unit-test targets",
|
|
19
|
+
"targets": [
|
|
20
|
+
{
|
|
21
|
+
"target": "src/public-api/client.test.ts",
|
|
22
|
+
"decision": "allowed",
|
|
23
|
+
"reason": "public API surface"
|
|
24
|
+
},
|
|
25
|
+
{
|
|
26
|
+
"target": "src/internal/billing/reconciler.test.ts",
|
|
27
|
+
"decision": "refused",
|
|
28
|
+
"reason": "internal service tests are out of scope for this Skill"
|
|
29
|
+
}
|
|
30
|
+
],
|
|
31
|
+
"write_started": false,
|
|
32
|
+
"post_write_guard": "not_run",
|
|
33
|
+
"stopped_at": "policy_refused"
|
|
34
|
+
}
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Keep values coarse. Do not include code, secrets, customer names, stack traces, raw tool output, or full transcripts.
|
|
38
|
+
|
|
39
|
+
## Minimal Skill guard
|
|
40
|
+
|
|
41
|
+
Add a short preflight before the Skill writes files:
|
|
42
|
+
|
|
43
|
+
```markdown
|
|
44
|
+
## Policy preflight
|
|
45
|
+
|
|
46
|
+
Before writing tests:
|
|
47
|
+
|
|
48
|
+
1. List the intended test targets.
|
|
49
|
+
2. Mark each target as `allowed` or `refused`.
|
|
50
|
+
3. Refuse before writing if any target imports or exercises internal services.
|
|
51
|
+
4. Emit a `skill.policy.v1` receipt with target names or coarse globs, decision, reason, and `write_started=false` when refused.
|
|
52
|
+
5. Only after every target is allowed, write files.
|
|
53
|
+
6. After writing, run the post-write guard and emit whether it passed.
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Then add a post-write check that is simple enough for an agent to run reliably:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
# Example: fail if generated unit tests import internal services.
|
|
60
|
+
grep -R "from ['\"]\.\./\.\./internal\|from ['\"]@/internal\|require(['\"]@/internal" \
|
|
61
|
+
-- '*test.*' '*spec.*'
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Adjust the grep for your repo. The important part is the receipt shape:
|
|
65
|
+
|
|
66
|
+
- `policy_target_listed`
|
|
67
|
+
- `policy_decision_allowed` / `policy_decision_refused`
|
|
68
|
+
- `refusal_reason`
|
|
69
|
+
- `write_started`
|
|
70
|
+
- `post_write_guard_passed` / `post_write_guard_failed`
|
|
71
|
+
- `stopped_at`
|
|
72
|
+
|
|
73
|
+
## Why this belongs next to context receipts
|
|
74
|
+
|
|
75
|
+
A Skill can be loaded and still fail to obey the boundary. That is the same class of problem as a healthy MCP server with tools invisible in the client, or a context file generated but not actually selected by the agent.
|
|
76
|
+
|
|
77
|
+
The useful question is: **where did the boundary proof stop?**
|
|
78
|
+
|
|
79
|
+
- Skill loaded, but no target list: policy was never made operational.
|
|
80
|
+
- Target list exists, but no decisions: policy was considered but not enforced.
|
|
81
|
+
- Refused target exists, but `write_started=true`: refusal came too late.
|
|
82
|
+
- Post-write guard failed: generated code crossed the forbidden boundary.
|
|
83
|
+
- Guard passed: the run has a small, reviewable receipt instead of only a confident claim.
|
|
84
|
+
|
|
85
|
+
## Try the copyable Skill recipe
|
|
86
|
+
|
|
87
|
+
See [`examples/agent-skills/skill-policy-receipts/`](../examples/agent-skills/skill-policy-receipts/) for a small `SKILL.md` recipe you can copy into Claude Code/OpenClaw-style Skill workflows.
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# Subagent role receipts
|
|
2
|
+
|
|
3
|
+
Custom subagents are useful only if the caller can tell which role instructions actually governed the delegated work.
|
|
4
|
+
|
|
5
|
+
Use this recipe when a project defines Codex/Claude Code/Cursor/OpenClaw-style subagents and wants a privacy-safe receipt for the role boundary: which role was requested, which instruction source was loaded, which tools/capabilities were allowed or deferred, and where the subagent stopped before crossing an unsafe boundary.
|
|
6
|
+
|
|
7
|
+
This is not a claim that every agent runner uses the same file format. Treat `agents.toml` as a portable **example** for role definitions, and treat the receipt as the stable artifact: evidence about the role boundary without logging raw prompts, source code, transcripts, tool output, secrets, or customer data.
|
|
8
|
+
|
|
9
|
+
## When this helps
|
|
10
|
+
|
|
11
|
+
Use a subagent role receipt when:
|
|
12
|
+
|
|
13
|
+
- a manager agent delegates work to a specialist reviewer, tester, security checker, migration planner, or docs writer;
|
|
14
|
+
- the role has a narrower policy than the main agent;
|
|
15
|
+
- the subagent has restricted tools, MCP servers, or write permissions;
|
|
16
|
+
- the role should refuse mutation and only report findings;
|
|
17
|
+
- a human reviewer needs to know which role instructions were loaded before trusting the result.
|
|
18
|
+
|
|
19
|
+
## Example role definition
|
|
20
|
+
|
|
21
|
+
The example in [`examples/subagent-role-receipts/agents.toml`](../examples/subagent-role-receipts/agents.toml) defines two project-local roles:
|
|
22
|
+
|
|
23
|
+
- `blast-radius-reviewer` — reviews AI-generated PRs by operational boundaries before merge;
|
|
24
|
+
- `temporal-authority-checker` — checks whether docs/specs are current or superseded before an agent writes code.
|
|
25
|
+
|
|
26
|
+
The file is intentionally small so it can be adapted to the runner you use.
|
|
27
|
+
|
|
28
|
+
## Receipt shape
|
|
29
|
+
|
|
30
|
+
Attach this to a PR body, task handoff, review-bot comment, or run summary.
|
|
31
|
+
|
|
32
|
+
```json
|
|
33
|
+
{
|
|
34
|
+
"type": "subagent.role_boundary.v1",
|
|
35
|
+
"delegation": {
|
|
36
|
+
"requested_role": "blast-radius-reviewer",
|
|
37
|
+
"effective_role": "blast-radius-reviewer",
|
|
38
|
+
"role_source": "agents.toml",
|
|
39
|
+
"role_source_hash": "sha256:example-only",
|
|
40
|
+
"caller": "manager-agent"
|
|
41
|
+
},
|
|
42
|
+
"instructions": {
|
|
43
|
+
"loaded": true,
|
|
44
|
+
"source_kind": "project-local-role-definition",
|
|
45
|
+
"raw_instruction_logged": false,
|
|
46
|
+
"policy_summary": [
|
|
47
|
+
"review by blast radius, not diff size",
|
|
48
|
+
"do not approve merge when boundary evidence is ambiguous"
|
|
49
|
+
]
|
|
50
|
+
},
|
|
51
|
+
"capabilities": {
|
|
52
|
+
"writes_allowed": false,
|
|
53
|
+
"tools_allowed": ["read", "grep", "test-summary"],
|
|
54
|
+
"tools_deferred_or_unavailable": ["shell-write", "deploy", "migration-apply"],
|
|
55
|
+
"mcp_servers_allowed": []
|
|
56
|
+
},
|
|
57
|
+
"boundary_decisions": [
|
|
58
|
+
{
|
|
59
|
+
"boundary": "schema_or_data_contract",
|
|
60
|
+
"status": "ambiguous",
|
|
61
|
+
"decision": "blocks_merge",
|
|
62
|
+
"reason": "migration rollback evidence missing"
|
|
63
|
+
}
|
|
64
|
+
],
|
|
65
|
+
"handoff": {
|
|
66
|
+
"result_kind": "review_receipt",
|
|
67
|
+
"stopped_at": "ambiguous boundary before merge approval",
|
|
68
|
+
"next_safe_action": "ask backend owner to confirm rollback and reader compatibility"
|
|
69
|
+
},
|
|
70
|
+
"privacy": {
|
|
71
|
+
"raw_prompt_logged": false,
|
|
72
|
+
"raw_source_logged": false,
|
|
73
|
+
"raw_tool_output_logged": false,
|
|
74
|
+
"transcript_logged": false,
|
|
75
|
+
"secrets_logged": false,
|
|
76
|
+
"customer_data_logged": false
|
|
77
|
+
}
|
|
78
|
+
}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Minimal checklist
|
|
82
|
+
|
|
83
|
+
Before trusting a delegated subagent result, ask for:
|
|
84
|
+
|
|
85
|
+
- requested role and effective role match;
|
|
86
|
+
- role definition source and coarse hash/version;
|
|
87
|
+
- whether role instructions loaded through the intended path;
|
|
88
|
+
- allowed/refused tool and write capabilities;
|
|
89
|
+
- boundary decisions made by the role;
|
|
90
|
+
- where the role stopped and the next safe action;
|
|
91
|
+
- explicit privacy flags showing raw prompts/source/tool output were not logged.
|
|
92
|
+
|
|
93
|
+
## What not to log
|
|
94
|
+
|
|
95
|
+
Do not include raw prompts, full instructions, transcripts, source code, file paths that expose private structure, tool output, secrets, credentials, customer data, stack traces, or proprietary diffs. Prefer coarse names, hashes, counts, decision states, and review-owner labels.
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
# Temporal context receipts
|
|
2
|
+
|
|
3
|
+
Use this when a long-lived AI coding project has old specs, ADRs, plans, or TODOs that still match grep but are no longer the current authority.
|
|
4
|
+
|
|
5
|
+
The goal is not to delete history or log raw project content. The goal is a tiny, privacy-safe receipt that proves the agent separated **current authority** from **historical citation** before it edits code.
|
|
6
|
+
|
|
7
|
+
This was prompted by a live `r/ClaudeCode` thread about the temporal problem in long-running projects: Claude Code can find every old plan, but grep is blind to time. If old docs do not carry status, date, and supersession metadata, the agent can treat a stale architecture note as current truth.
|
|
8
|
+
|
|
9
|
+
## Boundary to prove
|
|
10
|
+
|
|
11
|
+
For every coding run that reads design/context docs, capture a coarse receipt like this:
|
|
12
|
+
|
|
13
|
+
```json
|
|
14
|
+
{
|
|
15
|
+
"receipt_type": "context.temporal_authority.v1",
|
|
16
|
+
"request_id": "local-run-2026-05-28T16:00Z",
|
|
17
|
+
"current_authority": {
|
|
18
|
+
"file": "CURRENT_STATE.md",
|
|
19
|
+
"status": "current",
|
|
20
|
+
"as_of": "2026-05-28",
|
|
21
|
+
"scope": "checkout-flow"
|
|
22
|
+
},
|
|
23
|
+
"sources_considered": [
|
|
24
|
+
{
|
|
25
|
+
"file": "specs/2025-checkout-rewrite.md",
|
|
26
|
+
"status": "superseded",
|
|
27
|
+
"superseded_by": "CURRENT_STATE.md#checkout-flow",
|
|
28
|
+
"decision": "historical_citation_only"
|
|
29
|
+
},
|
|
30
|
+
{
|
|
31
|
+
"file": "specs/2026-checkout-risk-notes.md",
|
|
32
|
+
"status": "current",
|
|
33
|
+
"scope": "checkout-flow",
|
|
34
|
+
"decision": "allowed_as_supporting_context"
|
|
35
|
+
}
|
|
36
|
+
],
|
|
37
|
+
"ambiguous_sources": [],
|
|
38
|
+
"write_started": true,
|
|
39
|
+
"stopped_at": "temporal_authority_resolved"
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Keep values coarse. Do not include source code, raw plans, prompts, transcripts, secrets, customer names, stack traces, private paths, or raw tool output.
|
|
44
|
+
|
|
45
|
+
## Minimal doc convention
|
|
46
|
+
|
|
47
|
+
Give every long-lived context file a small frontmatter header:
|
|
48
|
+
|
|
49
|
+
```markdown
|
|
50
|
+
---
|
|
51
|
+
status: current # current | superseded | archived
|
|
52
|
+
scope: checkout-flow
|
|
53
|
+
date: 2026-05-28
|
|
54
|
+
superseded_by: null
|
|
55
|
+
---
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
For old specs:
|
|
59
|
+
|
|
60
|
+
```markdown
|
|
61
|
+
---
|
|
62
|
+
status: superseded
|
|
63
|
+
scope: checkout-flow
|
|
64
|
+
date: 2025-11-10
|
|
65
|
+
superseded_by: ../CURRENT_STATE.md#checkout-flow
|
|
66
|
+
---
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Then make `CURRENT_STATE.md` the short authority file an agent must read first:
|
|
70
|
+
|
|
71
|
+
```markdown
|
|
72
|
+
# Current state
|
|
73
|
+
|
|
74
|
+
## checkout-flow
|
|
75
|
+
|
|
76
|
+
- status: current
|
|
77
|
+
- as_of: 2026-05-28
|
|
78
|
+
- current authority: this section
|
|
79
|
+
- related historical specs:
|
|
80
|
+
- specs/2025-checkout-rewrite.md (superseded)
|
|
81
|
+
- specs/2026-checkout-risk-notes.md (current supporting context)
|
|
82
|
+
|
|
83
|
+
Agents may cite superseded specs for rationale, but must not implement from them unless the current authority explicitly reactivates that behavior.
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Agent preflight
|
|
87
|
+
|
|
88
|
+
Before editing code in a long-lived project, ask the agent to do this:
|
|
89
|
+
|
|
90
|
+
```markdown
|
|
91
|
+
## Temporal authority preflight
|
|
92
|
+
|
|
93
|
+
Before writing code:
|
|
94
|
+
|
|
95
|
+
1. Read `CURRENT_STATE.md` or the repo's current-state equivalent.
|
|
96
|
+
2. List design/spec/TODO/context files found for the requested scope.
|
|
97
|
+
3. Mark each source as `current`, `superseded`, `archived`, or `ambiguous`.
|
|
98
|
+
4. If any relevant source is `ambiguous` or lacks `superseded_by` while contradicting current authority, stop before writing.
|
|
99
|
+
5. Emit a `context.temporal_authority.v1` receipt with coarse file names/globs, status, decision, `write_started`, and `stopped_at`.
|
|
100
|
+
6. Only use superseded docs as historical citations, not as implementation authority.
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Useful receipt markers:
|
|
104
|
+
|
|
105
|
+
- `context_current_authority`
|
|
106
|
+
- `historical_spec_citation`
|
|
107
|
+
- `status_superseded`
|
|
108
|
+
- `superseded_by_resolved`
|
|
109
|
+
- `ambiguous_temporal_source`
|
|
110
|
+
- `stale_source_ignored`
|
|
111
|
+
- `write_refused_until_authority_resolved`
|
|
112
|
+
- `preflight_temporal_decision`
|
|
113
|
+
|
|
114
|
+
## Where this catches failures
|
|
115
|
+
|
|
116
|
+
- Old spec matches grep, but has `status: superseded`: agent can cite it but should not implement from it.
|
|
117
|
+
- Old spec conflicts with `CURRENT_STATE.md` and has no `superseded_by`: agent should stop and ask for authority resolution.
|
|
118
|
+
- Multiple current files claim the same scope: agent should stop before writing.
|
|
119
|
+
- Current authority exists, but the run never read it: the receipt should show `stopped_at=current_authority_missing` or `write_started=false`.
|
|
120
|
+
|
|
121
|
+
## Try the copyable example
|
|
122
|
+
|
|
123
|
+
See [`examples/temporal-context-receipts/`](../examples/temporal-context-receipts/) for a minimal `CURRENT_STATE.md`, superseded spec, current supporting note, and receipt example.
|
|
@@ -100,6 +100,27 @@ Minimal JSONL event names:
|
|
|
100
100
|
{"event":"subagent.toolsearch.matrix.completed","tested_axis":"tools_frontmatter_shape","audit_gap":"proves ToolSearch exposure, not semantic tool relevance or runtime call success"}
|
|
101
101
|
```
|
|
102
102
|
|
|
103
|
+
## Retrieval / code-search smoke
|
|
104
|
+
|
|
105
|
+
For semantic code search, repo RAG, or MCP tools such as Claude Context, separate "search returned" from "agent context loaded":
|
|
106
|
+
|
|
107
|
+
- which index snapshot/version was used, without raw local codebase paths;
|
|
108
|
+
- what query/category/filter identity selected the candidates, without raw query text;
|
|
109
|
+
- which result ids/chunk hashes were returned, with rank, score bucket, stale flag, duplicate marker, path hash/extension, and range bucket;
|
|
110
|
+
- which returned chunks were actually loaded into the agent context;
|
|
111
|
+
- which chunks were suppressed as duplicate, stale, clipped, policy-blocked, or over budget;
|
|
112
|
+
- whether raw code, raw prompts, raw paths, customer names, URLs, secrets, and ticket text stayed out of the receipt;
|
|
113
|
+
- the audit gap: this proves retrieval/loading boundaries, not semantic answer quality.
|
|
114
|
+
|
|
115
|
+
Minimal JSONL event names:
|
|
116
|
+
|
|
117
|
+
```jsonl
|
|
118
|
+
{"event":"code.index.snapshot.used","snapshot_id_hash":"sha256:...","codebase_path_hash":"sha256:...","indexed_chunk_count_bucket":"over_1k","raw_codebase_path_copied":false}
|
|
119
|
+
{"event":"code.search.performed","query_hash":"sha256:...","query_category":"auth_debug","candidate_count_bucket":"over_1k","raw_query_copied":false}
|
|
120
|
+
{"event":"code.search.result.returned","rank":1,"chunk_id_hash":"sha256:...","chunk_text_hash":"sha256:...","path_hash":"sha256:...","score_bucket":"high","stale":false,"raw_code_copied":false}
|
|
121
|
+
{"event":"context.input.loaded","kind":"retrieved_code_chunks","loaded_chunk_count":3,"suppressed_chunk_count":2,"suppression_reasons":["duplicate","stale_snapshot_chunk"],"raw_code_copied":false}
|
|
122
|
+
```
|
|
123
|
+
|
|
103
124
|
## Usage attribution smoke
|
|
104
125
|
|
|
105
126
|
For `/usage`, `/context`, `/doctor`, or other context-budget breakdowns, map each displayed category to evidence that can be reviewed without exposing private content:
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Skill policy receipts recipe
|
|
2
|
+
|
|
3
|
+
This is a copyable Agent Skill recipe for cases where a natural-language rule needs an inspectable guard.
|
|
4
|
+
|
|
5
|
+
Example use cases:
|
|
6
|
+
|
|
7
|
+
- a Skill must not generate tests for internal services;
|
|
8
|
+
- an agent must not edit generated files;
|
|
9
|
+
- a hook must not call production APIs;
|
|
10
|
+
- a migration helper must default to preview/dry-run unless `--apply` is explicit.
|
|
11
|
+
|
|
12
|
+
Copy `SKILL.md` into your Skill registry, adjust the policy and post-write guard, then ask the agent to emit `skill.policy.v1` receipts before writes and after guard checks.
|
|
13
|
+
|
|
14
|
+
The receipt should prove:
|
|
15
|
+
|
|
16
|
+
- intended targets were listed;
|
|
17
|
+
- each target was allowed or refused;
|
|
18
|
+
- refusal happened before writes;
|
|
19
|
+
- post-write guard passed or failed;
|
|
20
|
+
- no raw prompt, code, secret, customer data, stack trace, or full transcript was logged.
|
|
21
|
+
|
|
22
|
+
Related guide: [`docs/skill-policy-receipts.md`](../../../docs/skill-policy-receipts.md).
|