pluribus-context 0.3.41 → 0.3.42
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +4 -0
- package/README.md +3 -2
- package/bin/pluribus.js +3 -1
- package/docs/agent-surface-proof-chain.md +176 -0
- package/docs/cursor-claude-context-handoff.md +68 -0
- package/docs/index.html +1 -1
- package/docs/receipt-playground.html +54 -0
- package/docs/session-preflight-receipts.md +77 -0
- package/examples/claude-md-read-receipts/README.md +70 -0
- package/examples/claude-md-read-receipts/check-read-receipt.mjs +119 -0
- package/examples/claude-md-read-receipts/sample-read-receipt.json +45 -0
- package/examples/claude-md-read-receipts/stale-read-receipt.json +18 -0
- package/examples/context-sufficiency-trace/README.md +22 -0
- package/examples/context-sufficiency-trace/check-context-sufficiency.mjs +40 -0
- package/examples/context-sufficiency-trace/context-trace-pass.json +28 -0
- package/examples/context-sufficiency-trace/context-trace.json +47 -0
- package/examples/context-sufficiency-trace/ground-truth.json +13 -0
- package/examples/provider-degradation-canaries/README.md +79 -0
- package/examples/provider-degradation-canaries/check-degradation-receipt.mjs +64 -0
- package/examples/provider-degradation-canaries/healthy-decision.json +27 -0
- package/examples/provider-degradation-canaries/unsafe-write-decision.json +26 -0
- package/examples/semantic-anchor-receipts/README.md +49 -0
- package/examples/semantic-anchor-receipts/check-semantic-anchors.mjs +153 -0
- package/examples/semantic-anchor-receipts/cleaned-paste.md +17 -0
- package/examples/semantic-anchor-receipts/original-paste.md +19 -0
- package/examples/semantic-anchor-receipts/sample-receipt.json +62 -0
- package/examples/session-preflight-receipts/README.md +25 -0
- package/examples/session-preflight-receipts/session-preflight-receipt.json +39 -0
- package/examples/session-preflight-receipts/session-preflight.mdc +18 -0
- package/examples/task-scoped-mcp-config/README.md +60 -0
- package/examples/task-scoped-mcp-config/mcp-catalog.json +46 -0
- package/examples/task-scoped-mcp-config/select-mcp-config.mjs +64 -0
- package/examples/task-scoped-mcp-config/tasks/browser-debug.json +7 -0
- package/package.json +1 -1
- package/src/commands/demo.js +81 -1
- package/src/utils/version.js +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,10 @@
|
|
|
4
4
|
|
|
5
5
|
All notable changes to Pluribus are documented here.
|
|
6
6
|
|
|
7
|
+
## 0.3.42 - 2026-06-17
|
|
8
|
+
|
|
9
|
+
- Added `pluribus demo context-sufficiency-trace`, a tiny npm-runnable pass/fail demo for checking whether a compressed context bundle preserved the task's required files before editing began.
|
|
10
|
+
|
|
7
11
|
## 0.3.41 - 2026-06-10
|
|
8
12
|
|
|
9
13
|
- Added npm Agent Skills discovery keywords (`agent-skill`, `skillpm`, and `agent-skills-registry`) so `pluribus-context` can be indexed by npm-backed skill package managers while continuing to ship the existing low-authority `skills/*/SKILL.md` recipes.
|
package/README.md
CHANGED
|
@@ -14,7 +14,7 @@ The original sync workflow is still useful: Pluribus can keep project instructio
|
|
|
14
14
|
|
|
15
15
|
It is **not** a persistent memory layer, retrieval system, agent orchestrator, enterprise ContextOps platform, or agent-merging framework. Think evidence for context boundaries: `CLAUDE.md`, `.cursorrules`, `copilot-instructions.md`, `AGENTS.md`, MCP Tool Search, Agent Skills, RAG/code-search, pruning, and compaction — with privacy-safe receipts instead of raw content dumps.
|
|
16
16
|
|
|
17
|
-
**Reviewer shortcut:** evaluating Pluribus for a list, newsletter, package roundup, or tool directory? Use the [Community Review Packet](docs/community-review-packet.md) for copy-paste directory submission fields, safety/removability notes, feedback links, and disposable 60-second smoke tests. If you only run one command for the cross-tool audit, try `npx --yes pluribus-context@latest audit --json --fidelity-report` to see native discovery surfaces, generic fallbacks, load evidence, duplicate-load selection evidence, manual activation requirements, effective context scope, and semantic differences. For the agent-observability wedge, start with [context-budget receipts](docs/context-budget-receipts.md): privacy-safe evidence for what MCP schemas, skills, memory, subagents, CLI help, retrieval chunks, pruning runs, or compaction summaries crossed an agent boundary. It now explicitly covers the "Tool Search fixed MCP bloat" objection: the receipt proves which lane stayed deferred, which tool was expanded, and whether schemas leaked through `messages`/bootstrap anyway. For a 60-second runtime-discovery proof, run `npx --yes pluribus-context@latest demo tool-surface-diff --json`; it validates a receipt for discovered → activated → withheld/blocked MCP tools without raw schemas/prompts/results. If you are coming from Claude Code, GraphRAG, or memory tooling where retrieval succeeds but the agent ignores it, try the [context attention receipt example](examples/context-attention-receipts/) to prove required context was delivered, acknowledged, and cited before edits. If you want the same idea as a copyable skill, use the [context-receipts Agent Skill recipe](skills/context-receipts/). npm `latest` is currently aligned with the GitHub release; the review packet also documents a GitHub-release smoke fallback for future release-lag windows.
|
|
17
|
+
**Reviewer shortcut:** evaluating Pluribus for a list, newsletter, package roundup, or tool directory? Use the [Community Review Packet](docs/community-review-packet.md) for copy-paste directory submission fields, safety/removability notes, feedback links, and disposable 60-second smoke tests. If you are comparing plugins, Skills registries, config-sync tools, MCP setups, or Claude→Codex worker flows, start with the [Agent surface proof chain](docs/agent-surface-proof-chain.md) to separate install diffs, sync manifests, apply ledgers, surface state, selection traces, context-boundary spans, and handoff envelopes. If you only run one command for the cross-tool audit, try `npx --yes pluribus-context@latest audit --json --fidelity-report` to see native discovery surfaces, generic fallbacks, load evidence, duplicate-load selection evidence, manual activation requirements, effective context scope, and semantic differences. For the agent-observability wedge, start with [context-budget receipts](docs/context-budget-receipts.md): privacy-safe evidence for what MCP schemas, skills, memory, subagents, CLI help, retrieval chunks, pruning runs, or compaction summaries crossed an agent boundary. It now explicitly covers the "Tool Search fixed MCP bloat" objection: the receipt proves which lane stayed deferred, which tool was expanded, and whether schemas leaked through `messages`/bootstrap anyway. For a 60-second runtime-discovery proof, run `npx --yes pluribus-context@latest demo tool-surface-diff --json`; it validates a receipt for discovered → activated → withheld/blocked MCP tools without raw schemas/prompts/results. If you are coming from Claude Code, GraphRAG, or memory tooling where retrieval succeeds but the agent ignores it, try the [context attention receipt example](examples/context-attention-receipts/) to prove required context was delivered, acknowledged, and cited before edits. If MCP server catalogs are burning context before the task needs them, try the [task-scoped MCP config receipt demo](examples/task-scoped-mcp-config/) to generate a minimal `--mcp-config` plus a receipt for selected vs withheld servers. If a Claude Code Skill or paste-cleaning CLI claims big token savings, try the [semantic anchor preservation receipt demo](examples/semantic-anchor-receipts/) to prove the cleaned paste kept headings, API signatures, version notes, and security constraints. If a long Claude Code session, compaction, or topic switch makes `CLAUDE.md` feel stale, try the [CLAUDE.md read receipt example](examples/claude-md-read-receipts/) to prove which index/topic files were reloaded before the next edit. If Claude Code, Codex, or an API-backed agent starts timing out, drifting on tool choice, or producing bad patch formats while the status page is unclear, try the [provider degradation canary receipt example](examples/provider-degradation-canaries/) to decide whether writes should continue, fallback, or pause. If you want the same idea as a copyable skill, use the [context-receipts Agent Skill recipe](skills/context-receipts/). npm `latest` is currently aligned with the GitHub release; the review packet also documents a GitHub-release smoke fallback for future release-lag windows.
|
|
18
18
|
|
|
19
19
|
---
|
|
20
20
|
|
|
@@ -161,7 +161,7 @@ npx --yes pluribus-context@latest sync --dry-run
|
|
|
161
161
|
|
|
162
162
|
If the preview looks right, run `npx --yes pluribus-context@latest sync` to write the tool-specific files.
|
|
163
163
|
|
|
164
|
-
For a fuller walkthrough, see the [Quickstart](docs/quickstart.md). To enforce generated context files in pull requests, use the [CI audit example](docs/ci-audit-example.md); to catch drift before commits leave your machine, use the [Pre-commit Audit Hook](docs/pre-commit-audit.md). If your repo already has `CLAUDE.md`, `.cursorrules`, Copilot instructions, or `AGENTS.md`, run a [Context Drift Audit](docs/context-drift-audit.md) first, try the intentionally drifted [audit example](examples/context-drift-audit/), then follow [Migrate Existing AI Context Files](docs/migrate-existing-context.md). If you switch between Cursor, Claude Code, Copilot, and terminal agents, try the [Cursor ↔ Claude Code context handoff guide](docs/cursor-claude-context-handoff.md) and its [example source file](examples/context-handoff/pluribus.md). If you run multiple AI sessions on the same project, try the [Coordination Contract guide](docs/coordination-contract.md) and its [example source file](examples/coordination-contract/pluribus.md) to keep event-log/scratchpad protocol rules aligned without turning Pluribus into an orchestrator. If you evaluate code-search, MCP retrieval, RAG-over-notes, or agent memory tools, use the [Orchestration-layer Search Receipts](docs/orchestration-search-receipts.md) sketch to measure retrieved context from the harness layer without asking retrieval tools to inspect whole transcripts. If you are adding agent observability, traces, or OpenTelemetry-style events, start with [Context Receipts for Agent Observability](docs/context-receipts-for-agent-observability.md), then use the [Context Input Evidence](docs/context-input-evidence.md) sketch and its [executable demos](examples/context-input-evidence/) to separate source bytes, canonical text, delivered hashes, post-hoc session-log receipts, skill/plugin invocation receipts, shared-memory retrieval receipts, self-remediating brain/doctor receipts, and OpenTelemetry-style SpanEvents. If you publish AI rules, skills, or instruction bundles as "portable", use the [Portability Fidelity Report](docs/portability-fidelity-report.md) and its [example source file](examples/portability-fidelity/pluribus.md) to make compatibility claims evidence-based instead of self-attested. Before committing shared or generated AI instructions, use the [Context File Review Checklist](docs/context-file-review.md). If you're deciding between Pluribus and a one-way rules converter, see [When to use Pluribus](docs/when-to-use-pluribus.md). If you are debugging "context drift" after compaction or long sessions, start with the [Context Drift Taxonomy](docs/context-drift-taxonomy.md) to separate file drift from runtime precedence drift. If you use MCP memory or knowledge-graph tools, try the [MCP memory handoff demo](docs/memory-mcp-handoff.md) to keep recall/store protocols aligned across AI coding tools without turning Pluribus into a memory server. If your shared-memory or knowledge-graph setup lets agents write durable facts, use [Memory write policy receipts](docs/memory-write-policy-receipts.md) and the [copyable gate](examples/memory-write-policy/) to require proposed diffs, scope, lifecycle, visibility, approval, and privacy checks before one run can teach every harness. If hooks, local gateways, or agent firewalls block risky tool calls, use [Agent firewall denial/audit receipts](docs/agent-firewall-denial-audit.md) and the [copyable checker](examples/agent-firewall-denial-audit/) to split model-visible denial from private operator audit evidence. If you are turning Claude Code/OpenClaw/Cursor into role-based “AI employee” agents with Skills and memory folders, use the [Controlled learning queue](docs/controlled-learning-queue.md) and [copyable example](examples/controlled-learning-queue/) to let agents propose durable memory changes without silently rewriting shared ICP, pricing, compliance, or process assumptions. If `PreCompact` / `PostCompact` or `SessionStart(compact)` workflows decide whether an agent may continue after summarization, use [Compaction resume receipts](docs/compaction-resume-receipts.md) and the [copyable gate](examples/compaction-resume-receipts/) to prove what was summarized, which instruction sources reloaded, what state was lost/kept, and whether `safe_to_resume` is actually true. If an MCP server is healthy but tools are missing in Claude Code/Cursor/Codex, use the [MCP tool visibility receipts](docs/mcp-tool-visibility-receipts.md) checklist to separate launch, handshake, `tools/list`, client catalog, and first invocation failures. If a Claude Code/OpenClaw-style Skill states a hard rule but the run still violates it, use the [Skill policy receipts](docs/skill-policy-receipts.md) guide and [copyable Skill recipe](skills/skill-policy-receipts/) to turn target decisions, refusals, and post-write guards into privacy-safe evidence. If a Skill, plugin resource, MCP instruction, or custom-agent file exists but disappears in ACP/Zed/CLI/chat parity tests, use [Loaded-resource boundary receipts](docs/loaded-resource-boundary.md) and the [copyable checker](examples/loaded-resource-boundary/) to prove discovered, attached, injected, readable, and skipped-resource stages. If long-lived projects keep old specs/TODOs that still match grep but are no longer authoritative, use [Temporal context receipts](docs/temporal-context-receipts.md) and the [copyable current-state example](examples/temporal-context-receipts/) to separate current authority from historical citations before an agent writes code. If AI-generated pull requests are hard to review because diff size hides operational risk, use [AI PR review receipts](docs/ai-pr-review-receipts.md), the [copyable PR template](examples/ai-pr-review-receipts/), and the [GitHub Actions receipt gate](examples/ai-pr-review-receipts/.github/workflows/ai-pr-review-receipt.yml) to review by blast radius: schema/data contracts, async paths, rollout gates, side effects, and ambiguous boundaries. If you delegate work to Codex/Claude Code/Cursor/OpenClaw-style specialist subagents, use [Subagent role receipts](docs/subagent-role-receipts.md) and the [example role definitions](examples/subagent-role-receipts/) to prove the requested role, effective role, loaded instruction source, allowed/refused capabilities, stop point, and next safe action. If you run Claude Code-style dynamic workflows, ultracode, or local LLM gateway orchestration that spawns many agents, use [Dynamic workflow run receipts](docs/dynamic-workflow-run-receipts.md) and the [copyable workflow example](examples/dynamic-workflow-run-receipts/) to prove phases, per-agent roles/models, context loaded/skipped, tool grants, token spend buckets, per-agent fuses, heartbeat, stop reasons, and known gaps. If your workflow routes Explore/Propose/Spec/Design/Tasks/Apply/Verify across OpenCode, Claude Code, Cursor, Codex, or different models, use [Phase-boundary contracts](docs/phase-boundary-contracts.md) and the [copyable Apply→Verify gate](examples/phase-boundary-contract/) to prove allowed input context, output artifact, evidence required before the next phase, dropped context, and stop conditions. If you need CI/reviewers to decide whether an agent handoff can continue, must be reviewed, or should be rejected, use the [Review primitive gate](docs/review-primitive-gate.md), its [copyable gate example](examples/review-primitive-gate/), and the [Claude Code review hook bridge](examples/claude-code-review-hook/) to validate assignment boundaries, approved scope/access changes, required checks, privacy flags, and `complete / partial / unsafe-to-resume` state from CI or Claude Code `TaskCompleted` / `PostCompact` hooks. If Claude Projects, long chats, or compaction make the last clean artifact hard to recover, use [Canonical output receipts](docs/canonical-output-receipts.md) and the [copyable index example](examples/canonical-output-receipts/) to track stable IDs, paths, versions, exact grep phrases, decisions, rejected options, and next actions. If a setup script installs MCP servers, Skills, instruction files, hooks, or plugins across multiple agents, use [Install-plan receipts](docs/install-plan-receipts.md) and the [copyable example](examples/install-plan-receipts/) to prove planned writes, backups, network behavior, and `writes_started=false` before mutation. After a Skill installer runs, use [Skill install/load receipts](docs/skill-install-receipts.md) and the [copyable checker](examples/skill-install-receipts/) to prove source ref, target agents/scopes, discovery/load status, context-cost bucket, and `safe_to_start_session` without logging raw Skill bodies. If you are pruning Skill sprawl after real sessions, use [Skill use-rate receipts](docs/skill-use-rate-receipts.md) and the [copyable checker](examples/skill-use-rate-receipts/) to separate discovered/installed/attached from invoked/acted-on and catch "installed but unused" resources. If you supervise multiple Claude Code/Cursor/Codex/OpenClaw sessions in parallel, use the [Parallel session review ledger](docs/parallel-session-review-ledger.md) and [copyable checker](examples/parallel-session-review-ledger/) to decide which sessions are complete, partial, blocked, or unsafe to resume without trusting an agent summary. If you are reviewing Pluribus for a list, newsletter, or tool directory, use the [Community Review Packet](docs/community-review-packet.md) for directory submission fields, a one-line description, safety notes, and a disposable 60-second smoke test. Maintainers can track package/repo discovery with the [Discovery Smoke Checks](docs/discovery-smoke.md).
|
|
164
|
+
For a fuller walkthrough, see the [Quickstart](docs/quickstart.md). To enforce generated context files in pull requests, use the [CI audit example](docs/ci-audit-example.md); to catch drift before commits leave your machine, use the [Pre-commit Audit Hook](docs/pre-commit-audit.md). If your repo already has `CLAUDE.md`, `.cursorrules`, Copilot instructions, or `AGENTS.md`, run a [Context Drift Audit](docs/context-drift-audit.md) first, try the intentionally drifted [audit example](examples/context-drift-audit/), then follow [Migrate Existing AI Context Files](docs/migrate-existing-context.md). If you switch between Cursor, Claude Code, Copilot, and terminal agents, try the [Cursor ↔ Claude Code context handoff guide](docs/cursor-claude-context-handoff.md), its [example source file](examples/context-handoff/pluribus.md), and the copyable handoff receipt for checking stale source files, diverged tool rules, wrong memories, dead commands, and not-loaded context before another agent writes code. If you run multiple AI sessions on the same project, try the [Coordination Contract guide](docs/coordination-contract.md) and its [example source file](examples/coordination-contract/pluribus.md) to keep event-log/scratchpad protocol rules aligned without turning Pluribus into an orchestrator. If you evaluate code-search, MCP retrieval, RAG-over-notes, or agent memory tools, use the [Orchestration-layer Search Receipts](docs/orchestration-search-receipts.md) sketch to measure retrieved context from the harness layer without asking retrieval tools to inspect whole transcripts. If you are adding agent observability, traces, or OpenTelemetry-style events, start with [Context Receipts for Agent Observability](docs/context-receipts-for-agent-observability.md), then use the [Context Input Evidence](docs/context-input-evidence.md) sketch and its [executable demos](examples/context-input-evidence/) to separate source bytes, canonical text, delivered hashes, post-hoc session-log receipts, skill/plugin invocation receipts, shared-memory retrieval receipts, self-remediating brain/doctor receipts, and OpenTelemetry-style SpanEvents. If you publish AI rules, skills, or instruction bundles as "portable", use the [Portability Fidelity Report](docs/portability-fidelity-report.md) and its [example source file](examples/portability-fidelity/pluribus.md) to make compatibility claims evidence-based instead of self-attested. Before committing shared or generated AI instructions, use the [Context File Review Checklist](docs/context-file-review.md). If you're deciding between Pluribus and a one-way rules converter, see [When to use Pluribus](docs/when-to-use-pluribus.md). If you are debugging "context drift" after compaction or long sessions, start with the [Context Drift Taxonomy](docs/context-drift-taxonomy.md) to separate file drift from runtime precedence drift, then use the [CLAUDE.md read receipt example](examples/claude-md-read-receipts/) when the practical question is whether a session actually reloaded the right index/topic files before editing. If you use MCP memory or knowledge-graph tools, try the [MCP memory handoff demo](docs/memory-mcp-handoff.md) to keep recall/store protocols aligned across AI coding tools without turning Pluribus into a memory server. If a provider/model may be silently degraded, use the [provider degradation canary receipt example](examples/provider-degradation-canaries/) to record transport health, capability canaries, fallback choice, and the write gate before side effects. If your shared-memory or knowledge-graph setup lets agents write durable facts, use [Memory write policy receipts](docs/memory-write-policy-receipts.md) and the [copyable gate](examples/memory-write-policy/) to require proposed diffs, scope, lifecycle, visibility, approval, and privacy checks before one run can teach every harness. If hooks, local gateways, or agent firewalls block risky tool calls, use [Agent firewall denial/audit receipts](docs/agent-firewall-denial-audit.md) and the [copyable checker](examples/agent-firewall-denial-audit/) to split model-visible denial from private operator audit evidence. If you are turning Claude Code/OpenClaw/Cursor into role-based “AI employee” agents with Skills and memory folders, use the [Controlled learning queue](docs/controlled-learning-queue.md) and [copyable example](examples/controlled-learning-queue/) to let agents propose durable memory changes without silently rewriting shared ICP, pricing, compliance, or process assumptions. If `PreCompact` / `PostCompact` or `SessionStart(compact)` workflows decide whether an agent may continue after summarization, use [Compaction resume receipts](docs/compaction-resume-receipts.md) and the [copyable gate](examples/compaction-resume-receipts/) to prove what was summarized, which instruction sources reloaded, what state was lost/kept, and whether `safe_to_resume` is actually true. If MCP tools consume context before a task needs them, use the [Task-scoped MCP config receipt demo](examples/task-scoped-mcp-config/) to generate a minimal `--mcp-config` and prove which servers were selected or withheld before startup. If an MCP server is healthy but tools are missing in Claude Code/Cursor/Codex, use the [MCP tool visibility receipts](docs/mcp-tool-visibility-receipts.md) checklist to separate launch, handshake, `tools/list`, client catalog, and first invocation failures. If a Claude Code/OpenClaw-style Skill states a hard rule but the run still violates it, use the [Skill policy receipts](docs/skill-policy-receipts.md) guide and [copyable Skill recipe](skills/skill-policy-receipts/) to turn target decisions, refusals, and post-write guards into privacy-safe evidence. If a Skill, plugin resource, MCP instruction, or custom-agent file exists but disappears in ACP/Zed/CLI/chat parity tests, use [Loaded-resource boundary receipts](docs/loaded-resource-boundary.md) and the [copyable checker](examples/loaded-resource-boundary/) to prove discovered, attached, injected, readable, and skipped-resource stages. If long-lived projects keep old specs/TODOs that still match grep but are no longer authoritative, use [Temporal context receipts](docs/temporal-context-receipts.md) and the [copyable current-state example](examples/temporal-context-receipts/) to separate current authority from historical citations before an agent writes code. If AI-generated pull requests are hard to review because diff size hides operational risk, use [AI PR review receipts](docs/ai-pr-review-receipts.md), the [copyable PR template](examples/ai-pr-review-receipts/), and the [GitHub Actions receipt gate](examples/ai-pr-review-receipts/.github/workflows/ai-pr-review-receipt.yml) to review by blast radius: schema/data contracts, async paths, rollout gates, side effects, and ambiguous boundaries. If you delegate work to Codex/Claude Code/Cursor/OpenClaw-style specialist subagents, use [Subagent role receipts](docs/subagent-role-receipts.md) and the [example role definitions](examples/subagent-role-receipts/) to prove the requested role, effective role, loaded instruction source, allowed/refused capabilities, stop point, and next safe action. If you run Claude Code-style dynamic workflows, ultracode, or local LLM gateway orchestration that spawns many agents, use [Dynamic workflow run receipts](docs/dynamic-workflow-run-receipts.md) and the [copyable workflow example](examples/dynamic-workflow-run-receipts/) to prove phases, per-agent roles/models, context loaded/skipped, tool grants, token spend buckets, per-agent fuses, heartbeat, stop reasons, and known gaps. If your workflow routes Explore/Propose/Spec/Design/Tasks/Apply/Verify across OpenCode, Claude Code, Cursor, Codex, or different models, use [Phase-boundary contracts](docs/phase-boundary-contracts.md) and the [copyable Apply→Verify gate](examples/phase-boundary-contract/) to prove allowed input context, output artifact, evidence required before the next phase, dropped context, and stop conditions. If you need CI/reviewers to decide whether an agent handoff can continue, must be reviewed, or should be rejected, use the [Review primitive gate](docs/review-primitive-gate.md), its [copyable gate example](examples/review-primitive-gate/), and the [Claude Code review hook bridge](examples/claude-code-review-hook/) to validate assignment boundaries, approved scope/access changes, required checks, privacy flags, and `complete / partial / unsafe-to-resume` state from CI or Claude Code `TaskCompleted` / `PostCompact` hooks. If Claude Projects, long chats, or compaction make the last clean artifact hard to recover, use [Canonical output receipts](docs/canonical-output-receipts.md) and the [copyable index example](examples/canonical-output-receipts/) to track stable IDs, paths, versions, exact grep phrases, decisions, rejected options, and next actions. If a setup script installs MCP servers, Skills, instruction files, hooks, or plugins across multiple agents, use [Install-plan receipts](docs/install-plan-receipts.md) and the [copyable example](examples/install-plan-receipts/) to prove planned writes, backups, network behavior, and `writes_started=false` before mutation. After a Skill installer runs, use [Skill install/load receipts](docs/skill-install-receipts.md) and the [copyable checker](examples/skill-install-receipts/) to prove source ref, target agents/scopes, discovery/load status, context-cost bucket, and `safe_to_start_session` without logging raw Skill bodies. If you are pruning Skill sprawl after real sessions, use [Skill use-rate receipts](docs/skill-use-rate-receipts.md) and the [copyable checker](examples/skill-use-rate-receipts/) to separate discovered/installed/attached from invoked/acted-on and catch "installed but unused" resources. If you supervise multiple Claude Code/Cursor/Codex/OpenClaw sessions in parallel, use the [Parallel session review ledger](docs/parallel-session-review-ledger.md) and [copyable checker](examples/parallel-session-review-ledger/) to decide which sessions are complete, partial, blocked, or unsafe to resume without trusting an agent summary. If you are reviewing Pluribus for a list, newsletter, or tool directory, use the [Community Review Packet](docs/community-review-packet.md) for directory submission fields, a one-line description, safety notes, and a disposable 60-second smoke test. Maintainers can track package/repo discovery with the [Discovery Smoke Checks](docs/discovery-smoke.md).
|
|
165
165
|
|
|
166
166
|
### Usage
|
|
167
167
|
|
|
@@ -407,6 +407,7 @@ If you've felt this pain, tell me about your setup. What tools do you use? How d
|
|
|
407
407
|
- [OpenClaw Integration](docs/openclaw-integration.md) — how Pluribus generates `AGENTS.md` for OpenClaw
|
|
408
408
|
- [Composable Contexts](docs/composable-contexts.md) — local/remote imports, merge behavior, and safety rules
|
|
409
409
|
- [MCP Memory Handoff](docs/memory-mcp-handoff.md) — demo for keeping memory recall/store protocols aligned across tool-specific instruction files
|
|
410
|
+
- [Task-scoped MCP Config Receipt](examples/task-scoped-mcp-config/) — generate a minimal `--mcp-config` plus selected/withheld server receipt for MCP context-bloat reviews
|
|
410
411
|
- [MCP Tool Visibility Receipts](docs/mcp-tool-visibility-receipts.md) — checklist for debugging healthy MCP servers whose tools do not appear in the agent client catalog
|
|
411
412
|
- [MCP Runtime Config Receipts](docs/mcp-runtime-config-receipts.md) — live-vs-template evidence for MCP permission/config drift review
|
|
412
413
|
- [Remote Composable Context Imports](docs/remote-composable-context-imports.md) — design notes for lockfile/cache/auth hardening
|
package/bin/pluribus.js
CHANGED
|
@@ -70,6 +70,7 @@ OPTIONS (demo)
|
|
|
70
70
|
--receipt Validate a custom demo receipt JSON file
|
|
71
71
|
--input Import a custom demo input file, such as rpc-messages.jsonl
|
|
72
72
|
--json Print machine-readable demo results
|
|
73
|
+
--pass For context-sufficiency-trace, use the bundled passing trace
|
|
73
74
|
|
|
74
75
|
EXAMPLES
|
|
75
76
|
pluribus init
|
|
@@ -96,6 +97,7 @@ EXAMPLES
|
|
|
96
97
|
pluribus demo mcp-telemetry-import --json
|
|
97
98
|
pluribus demo tool-surface-diff
|
|
98
99
|
pluribus demo tool-surface-diff --json
|
|
100
|
+
pluribus demo context-sufficiency-trace --json
|
|
99
101
|
|
|
100
102
|
DOCS
|
|
101
103
|
https://github.com/caioribeiroclw-pixel/pluribus
|
|
@@ -107,7 +109,7 @@ const COMMAND_FLAGS = {
|
|
|
107
109
|
validate: new Set(['source', 'update-imports']),
|
|
108
110
|
audit: new Set(['source', 'tools', 'update-imports', 'strict', 'ci', 'json', 'output', 'github-annotations', 'fidelity-report']),
|
|
109
111
|
watch: new Set(['source', 'tools', 'update-imports', 'dry-run', 'once', 'debounce']),
|
|
110
|
-
demo: new Set(['receipt', 'input', 'json']),
|
|
112
|
+
demo: new Set(['receipt', 'input', 'json', 'pass']),
|
|
111
113
|
}
|
|
112
114
|
|
|
113
115
|
function getFlagNames(argv) {
|
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
# Agent surface proof chain
|
|
2
|
+
|
|
3
|
+
Agent setup is becoming a bundle, not a file: Skills, hooks, MCP servers, subagents, slash commands, permissions, profiles, plugins, and cross-model workers all get installed or synced together.
|
|
4
|
+
|
|
5
|
+
A single “receipt” is too vague for that surface. Use the smallest proof object for the boundary you are crossing, and do not let one green check imply the next boundary worked.
|
|
6
|
+
|
|
7
|
+
## Quick model
|
|
8
|
+
|
|
9
|
+
```text
|
|
10
|
+
registry publishes
|
|
11
|
+
→ installer plans writes
|
|
12
|
+
→ sync applies writes
|
|
13
|
+
→ host exposes surface
|
|
14
|
+
→ task makes tools/skills eligible
|
|
15
|
+
→ runtime calls or activates them
|
|
16
|
+
→ workers hand results back
|
|
17
|
+
→ reviewer/CI promotes state
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Each arrow can fail independently.
|
|
21
|
+
|
|
22
|
+
## Boundary-specific proof objects
|
|
23
|
+
|
|
24
|
+
| Boundary | Use this proof object | Proves | Does **not** prove |
|
|
25
|
+
| --- | --- | --- | --- |
|
|
26
|
+
| Setup script or plugin is about to write files | **Install diff** | Planned files, permissions, hooks, MCP servers, Skills, commands, backups, network/env access, and `writes_started=false` before mutation | The host later loaded or used any installed surface |
|
|
27
|
+
| Registry sync says it succeeded | **Post-sync manifest** | Published asset version, target agents/scopes, written paths, content hashes, skipped targets, errors, and restart requirements | Runtime discovery or activation |
|
|
28
|
+
| Continuous config sync ran with `--apply` | **Post-apply ledger** | What was actually written, skipped, backed up, failed, or sent to manual review after apply | That Claude/Codex/Cursor followed the config |
|
|
29
|
+
| Host starts after install/sync | **Surface state** | What is visible/attached/discovered vs skipped/withheld: Skills, hooks, MCP tools, agents, slash commands, instruction files | That a specific task selected the right surface |
|
|
30
|
+
| Runtime task decides what to use | **Selection trace** | Available → eligible → called → enforced for tools/Skills/MCP, with privacy-safe reasons | That the output is correct |
|
|
31
|
+
| Model/provider may be silently degraded | **Degradation decision record** | Transport health, app-critical canaries, fallback choice, degradation confidence, and whether writes should continue | That the provider is globally healthy or that future turns will remain stable |
|
|
32
|
+
| Long session, compaction, or topic switch resumes work | **Read receipt / safe-to-edit gate** | Which index/topic files or summaries reloaded, active constraints, stale notes rejected, and whether editing is safe | That future turns will keep following the context |
|
|
33
|
+
| Debugger shows a chain of LLM calls | **Context-boundary span** | Which context inputs crossed into each node, hashes/paths by default, withheld inputs, replay reason, downstream invalidations | A raw prompt dump, secret-safe by itself, or full correctness |
|
|
34
|
+
| Claude delegates to Codex/Gemini/subagents | **Handoff envelope** | Task, parent-plan hash, allowed files/commands, passed context sources, output schema, timeout, and insufficient-context path | That worker output is trusted project state |
|
|
35
|
+
| Worker results are merged back | **Merge-back evidence** | What changed, evidence used, assumptions, invalidated downstream outputs, and reviewer/CI promotion decision | That the original worker had complete context |
|
|
36
|
+
|
|
37
|
+
## Minimal fields by proof object
|
|
38
|
+
|
|
39
|
+
### Install diff
|
|
40
|
+
|
|
41
|
+
```json
|
|
42
|
+
{
|
|
43
|
+
"proof_type": "install_diff",
|
|
44
|
+
"installer": "claude-code-setup",
|
|
45
|
+
"targets": ["claude-code", "codex"],
|
|
46
|
+
"planned_writes": [
|
|
47
|
+
{"path": ".claude/hooks/pretooluse.json", "kind": "hook", "backup": true},
|
|
48
|
+
{"path": ".mcp.json", "kind": "mcp_config", "backup": true}
|
|
49
|
+
],
|
|
50
|
+
"env_or_network_access": ["ANTHROPIC_API_KEY:required-not-recorded"],
|
|
51
|
+
"writes_started": false,
|
|
52
|
+
"review_required": true
|
|
53
|
+
}
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Post-sync manifest
|
|
57
|
+
|
|
58
|
+
```json
|
|
59
|
+
{
|
|
60
|
+
"proof_type": "post_sync_manifest",
|
|
61
|
+
"run_id": "skills-sync-2026-06-14T21:00Z",
|
|
62
|
+
"source": "team-skills-registry",
|
|
63
|
+
"targets": [
|
|
64
|
+
{
|
|
65
|
+
"agent": "claude-code",
|
|
66
|
+
"scope": "project",
|
|
67
|
+
"skills_dir": ".claude/skills",
|
|
68
|
+
"written": [{"name": "review-pr", "version": "1.4.2", "sha256": "..."}],
|
|
69
|
+
"skipped": []
|
|
70
|
+
}
|
|
71
|
+
],
|
|
72
|
+
"restart_required": true
|
|
73
|
+
}
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Post-apply ledger
|
|
77
|
+
|
|
78
|
+
```json
|
|
79
|
+
{
|
|
80
|
+
"proof_type": "post_apply_ledger",
|
|
81
|
+
"run_id": "config-sync-123",
|
|
82
|
+
"plan_hash": "sha256:...",
|
|
83
|
+
"writes_started": true,
|
|
84
|
+
"backup_root": ".agent-sync/backups/2026-06-14T21-00Z",
|
|
85
|
+
"operations": [
|
|
86
|
+
{
|
|
87
|
+
"path": "AGENTS.md",
|
|
88
|
+
"status": "written",
|
|
89
|
+
"before_hash": "sha256:old",
|
|
90
|
+
"after_hash": "sha256:new",
|
|
91
|
+
"backup_path": ".agent-sync/backups/.../AGENTS.md"
|
|
92
|
+
},
|
|
93
|
+
{
|
|
94
|
+
"path": ".codex/config.toml",
|
|
95
|
+
"status": "manual_review",
|
|
96
|
+
"reason": "permission profile changed"
|
|
97
|
+
}
|
|
98
|
+
]
|
|
99
|
+
}
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Selection trace
|
|
103
|
+
|
|
104
|
+
```json
|
|
105
|
+
{
|
|
106
|
+
"proof_type": "selection_trace",
|
|
107
|
+
"turn_id": "turn-42",
|
|
108
|
+
"loaded_instructions": ["CLAUDE.md", ".claude/skills/memory/SKILL.md"],
|
|
109
|
+
"mcp_tools_visible": ["memory.search", "memory.write"],
|
|
110
|
+
"task_intent": "recall prior decision before editing auth flow",
|
|
111
|
+
"expected_tools": ["memory.search"],
|
|
112
|
+
"eligible_tools": ["memory.search"],
|
|
113
|
+
"called_tools": ["memory.search"],
|
|
114
|
+
"enforced_by_hook": true
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Degradation decision record
|
|
119
|
+
|
|
120
|
+
```json
|
|
121
|
+
{
|
|
122
|
+
"proof_type": "degradation_decision_record",
|
|
123
|
+
"run_id": "agent-run-2026-06-15T20:02Z",
|
|
124
|
+
"provider": "anthropic",
|
|
125
|
+
"model": "claude-sonnet-4",
|
|
126
|
+
"region": "us-east-1",
|
|
127
|
+
"prompt_template_hash": "sha256:...",
|
|
128
|
+
"canary_suite_version": "coding-agent-smoke-2026-06-15",
|
|
129
|
+
"transport": {
|
|
130
|
+
"ttft_p95_ms": 1400,
|
|
131
|
+
"total_latency_p95_ms": 9200,
|
|
132
|
+
"timeout_rate": 0.01,
|
|
133
|
+
"error_rate": 0
|
|
134
|
+
},
|
|
135
|
+
"capability_canaries": [
|
|
136
|
+
{"name": "json_schema", "status": "pass", "severity": "write_blocking"},
|
|
137
|
+
{"name": "tool_choice", "status": "pass", "severity": "write_blocking"},
|
|
138
|
+
{"name": "patch_format", "status": "pass", "severity": "write_blocking"}
|
|
139
|
+
],
|
|
140
|
+
"confidence": "healthy",
|
|
141
|
+
"write_gate": "continue"
|
|
142
|
+
}
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### Handoff envelope
|
|
146
|
+
|
|
147
|
+
```json
|
|
148
|
+
{
|
|
149
|
+
"proof_type": "handoff_envelope",
|
|
150
|
+
"from": "opus-supervisor",
|
|
151
|
+
"to": "codex-worker-2",
|
|
152
|
+
"task": "compare parser failures in imports.test.js",
|
|
153
|
+
"parent_plan_hash": "sha256:...",
|
|
154
|
+
"allowed_files": ["src/utils/imports.js", "test/imports.test.js"],
|
|
155
|
+
"allowed_commands": ["npm test -- imports"],
|
|
156
|
+
"context_sources_passed": ["spec/context-format.md#remote-imports"],
|
|
157
|
+
"expected_output_schema": "worker_result_v1",
|
|
158
|
+
"stop_condition": "one patch candidate or explicit insufficient_context"
|
|
159
|
+
}
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
## Practical rules
|
|
163
|
+
|
|
164
|
+
1. **Do not promote intent as outcome.** A dry-run plan is not an apply ledger.
|
|
165
|
+
2. **Do not promote visibility as use.** A visible MCP tool or Skill is not an activated/called one.
|
|
166
|
+
3. **Do not promote latency as reliability.** A low-latency provider call can still fail app-critical canaries, and a slow call may still be safe for read-only work.
|
|
167
|
+
4. **Do not promote worker output as project truth.** Merge-back needs evidence and invalidation notes.
|
|
168
|
+
5. **Keep receipts privacy-safe by default.** Prefer paths, hashes, names, versions, statuses, and reasons; expand raw bodies only under explicit local review.
|
|
169
|
+
6. **Name skipped and withheld context.** What did not load is often the failure.
|
|
170
|
+
7. **Use the product’s own vocabulary.** Say install diff for installers, ledger for sync apply, span for debuggers, envelope for delegation, degradation decision for provider/model health, and read receipt for re-grounding.
|
|
171
|
+
|
|
172
|
+
## When Pluribus fits
|
|
173
|
+
|
|
174
|
+
Use Pluribus when you need privacy-safe evidence around agent context boundaries: generated instruction files, Skills, MCP tools, memory/RAG results, compaction, pruning, provider/model degradation decisions, plugin setup, or cross-tool handoffs.
|
|
175
|
+
|
|
176
|
+
Do not use Pluribus as a registry, memory server, agent orchestrator, or replacement for Claude/Codex/Cursor runtime diagnostics. It is the evidence layer around those systems.
|
|
@@ -90,6 +90,74 @@ I am building Acme Billing, a TypeScript service used by finance operators.
|
|
|
90
90
|
4. **Audit:** run `pluribus audit` locally or in CI to catch drift when someone edits generated files directly.
|
|
91
91
|
5. **Use memory separately:** if you also run an MCP memory server, keep the recall/store protocol in `pluribus.md` and let the memory server store durable facts.
|
|
92
92
|
|
|
93
|
+
## What usually breaks first
|
|
94
|
+
|
|
95
|
+
When developers switch seriously between Cursor, Claude Code, Codex/Copilot-style tools, Windsurf, and terminal agents, the failure is rarely “there is no context anywhere.” It is usually one of these smaller breaks:
|
|
96
|
+
|
|
97
|
+
| Break | Symptom | Fast check |
|
|
98
|
+
| --- | --- | --- |
|
|
99
|
+
| Source file stale | `AGENTS.md`, `CLAUDE.md`, or `.cursorrules` still describes old paths, commands, or architecture | Run `pluribus audit`; compare generated output to the committed `pluribus.md` source |
|
|
100
|
+
| Tool-specific layer diverged | Cursor rules say one convention, Claude/Codex instructions say another | Keep tool-specific files generated/thin; review manual edits as drift |
|
|
101
|
+
| Memory became authority | An MCP memory/Obsidian/Notion note overrides the current repo state | Put memory recall/store policy in `pluribus.md`; require code/tests/current docs to win over recalled facts |
|
|
102
|
+
| Commands or paths changed | The agent follows a dead build/test command copied from an older chat | Keep commands in `package.json`, `Makefile`, `justfile`, or scripts; link to them from context instead of duplicating shell snippets |
|
|
103
|
+
| Context was never loaded | The right file exists, but the active tool/session did not read it | Ask for a short handoff receipt before edits: which source file, generated target, memory recall, and test command were actually used |
|
|
104
|
+
|
|
105
|
+
## Copyable handoff receipt
|
|
106
|
+
|
|
107
|
+
Use this as a lightweight debugging note before a risky cross-tool handoff. It is intentionally privacy-safe: hashes, paths, and decisions instead of raw prompt transcripts or private source content.
|
|
108
|
+
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"handoff_id": "cursor-to-claude-2026-06-16-001",
|
|
112
|
+
"from_tool": "cursor",
|
|
113
|
+
"to_tool": "claude-code",
|
|
114
|
+
"repo_ref": {
|
|
115
|
+
"branch": "main",
|
|
116
|
+
"head_sha": "abc1234"
|
|
117
|
+
},
|
|
118
|
+
"source_of_truth": {
|
|
119
|
+
"path": "pluribus.md",
|
|
120
|
+
"sha256": "hash-of-current-project-context",
|
|
121
|
+
"validated_at": "2026-06-16T00:00:00Z"
|
|
122
|
+
},
|
|
123
|
+
"generated_context": [
|
|
124
|
+
{
|
|
125
|
+
"tool": "cursor",
|
|
126
|
+
"path": ".cursorrules",
|
|
127
|
+
"status": "in_sync"
|
|
128
|
+
},
|
|
129
|
+
{
|
|
130
|
+
"tool": "claude-code",
|
|
131
|
+
"path": "CLAUDE.md",
|
|
132
|
+
"status": "in_sync"
|
|
133
|
+
},
|
|
134
|
+
{
|
|
135
|
+
"tool": "openclaw-or-codex-style-agent",
|
|
136
|
+
"path": "AGENTS.md",
|
|
137
|
+
"status": "in_sync"
|
|
138
|
+
}
|
|
139
|
+
],
|
|
140
|
+
"memory_policy": {
|
|
141
|
+
"mcp_memory_allowed": true,
|
|
142
|
+
"current_repo_overrides_memory": true,
|
|
143
|
+
"store_new_facts_without_review": false
|
|
144
|
+
},
|
|
145
|
+
"active_task": {
|
|
146
|
+
"summary": "Implement the smallest safe patch for issue #123",
|
|
147
|
+
"allowed_paths": ["src/billing/**", "tests/billing/**"],
|
|
148
|
+
"required_checks": ["npm test -- billing"]
|
|
149
|
+
},
|
|
150
|
+
"loaded_evidence": {
|
|
151
|
+
"agent_says_it_read": ["CLAUDE.md", "package.json"],
|
|
152
|
+
"missing_or_deferred": ["old Obsidian architecture note"],
|
|
153
|
+
"stale_conflicts_found": []
|
|
154
|
+
},
|
|
155
|
+
"safe_to_continue": true
|
|
156
|
+
}
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
The receipt should be boring. If it shows stale conflicts, missing generated files, or memory overriding current repo state, stop and fix the source of truth before asking another agent to write code.
|
|
160
|
+
|
|
93
161
|
## What this deliberately does not solve
|
|
94
162
|
|
|
95
163
|
- It does not move active chat state from Cursor to Claude Code.
|
package/docs/index.html
CHANGED
|
@@ -19,7 +19,7 @@
|
|
|
19
19
|
<p>Privacy-safe evidence for AI context boundaries: what crossed into the agent, what stayed out, and what should stop before edits or tool calls.</p>
|
|
20
20
|
<div class="card">
|
|
21
21
|
<h2>Browser playground</h2>
|
|
22
|
-
<p>Try receipt validation without installing anything or sending data to a server. Current samples cover MCP tool-surface diffs, retrieved-context attention,
|
|
22
|
+
<p>Try receipt validation without installing anything or sending data to a server. Current samples cover MCP tool-surface diffs, retrieved-context attention, CLAUDE.md rule-attention drift, skill install provenance, and retrieval adoption evidence. If you are debugging MCP context bloat before startup, use the <a href="https://github.com/caioribeiroclw-pixel/pluribus/tree/main/examples/task-scoped-mcp-config">task-scoped MCP config receipt demo</a>. If a Skill or paste-cleaning CLI claims token savings, use the <a href="https://github.com/caioribeiroclw-pixel/pluribus/tree/main/examples/semantic-anchor-receipts">semantic anchor preservation receipt demo</a> to prove it kept version/API/security anchors. If a Cursor/Claude Code workflow requires a first tool or pre-tool hook before edits, use the <a href="session-preflight-receipts.md">session preflight receipt</a> and <a href="https://github.com/caioribeiroclw-pixel/pluribus/tree/main/examples/session-preflight-receipts">copyable rule/example</a>.</p>
|
|
23
23
|
<p><a href="receipt-playground.html">Open the receipt playground →</a></p>
|
|
24
24
|
</div>
|
|
25
25
|
<div class="card">
|
|
@@ -42,6 +42,7 @@
|
|
|
42
42
|
<option value="attentionFail">Sample: context attention fail</option>
|
|
43
43
|
<option value="ruleAttention">Sample: CLAUDE.md rule attention drift</option>
|
|
44
44
|
<option value="skillInstall">Sample: Agent Skill install provenance</option>
|
|
45
|
+
<option value="retrievalAdoption">Sample: retrieval adoption receipt</option>
|
|
45
46
|
</select>
|
|
46
47
|
<button id="load">Load sample</button>
|
|
47
48
|
<button id="validate">Validate</button>
|
|
@@ -56,6 +57,7 @@
|
|
|
56
57
|
<p><span class="pill">context attention</span> proves required retrieved context was delivered, acknowledged, and cited before edits.</p>
|
|
57
58
|
<p><span class="pill">rule attention</span> shows whether project rules were re-read, cited in the pre-edit plan, and checked before changes.</p>
|
|
58
59
|
<p><span class="pill">skill install provenance</span> shows which <code>SKILL.md</code> source path won, which mirrors were ignored, and whether install metadata stayed content-safe.</p>
|
|
60
|
+
<p><span class="pill">retrieval adoption</span> shows whether an available graph/RAG tool was actually used before claims, not just configured.</p>
|
|
59
61
|
<h2>CLI equivalent</h2>
|
|
60
62
|
<p><code>npx --yes pluribus-context@latest demo tool-surface-diff --json</code></p>
|
|
61
63
|
</aside>
|
|
@@ -106,6 +108,36 @@
|
|
|
106
108
|
privacy: { raw_prompts_omitted: true, raw_claude_md_omitted: true, source_code_omitted: true, tool_outputs_omitted: true, tokens_omitted: true, customer_data_omitted: true },
|
|
107
109
|
result: { status: 'unsafe_to_edit', next_safe_action: 'stop, re-read missing rule ids, and restate the pre-edit plan with citations' }
|
|
108
110
|
},
|
|
111
|
+
retrievalAdoption: {
|
|
112
|
+
schema: 'pluribus.retrieval_adoption_receipt.v1',
|
|
113
|
+
receipt_id: 'retrieval_adoption_graph_demo',
|
|
114
|
+
task_id: 'medusa-code-audit-graph-usage',
|
|
115
|
+
agent_surface: 'claude_code',
|
|
116
|
+
retrieval_surface: { kind: 'knowledge_graph_mcp', name: 'project-graph', available: true, forced: false },
|
|
117
|
+
claim: { expected_use: 'graph_retrieval_before_code_audit', evaluation_window: 'before_final_audit_claims' },
|
|
118
|
+
availability_evidence: {
|
|
119
|
+
tool_catalog_contains: ['graph.search', 'graph.neighbors', 'graph.explain_path'],
|
|
120
|
+
graph_index: { nodes: 3863, edges: 11204, snapshot_hash: 'sha256:graph-snapshot-redacted' },
|
|
121
|
+
raw_graph_omitted: true
|
|
122
|
+
},
|
|
123
|
+
adoption_evidence: {
|
|
124
|
+
retrieval_calls: [],
|
|
125
|
+
native_escape_hatch_calls: [
|
|
126
|
+
{ kind: 'grep', count: 14 },
|
|
127
|
+
{ kind: 'read_file', count: 39 }
|
|
128
|
+
],
|
|
129
|
+
cited_retrieved_context_ids: [],
|
|
130
|
+
graph_grounded_findings: 0,
|
|
131
|
+
raw_tool_outputs_omitted: true
|
|
132
|
+
},
|
|
133
|
+
boundary_checks: {
|
|
134
|
+
retrieval_available_but_unused: true,
|
|
135
|
+
required_retrieval_gate_present: false,
|
|
136
|
+
no_retrieval_claim_allowed: true
|
|
137
|
+
},
|
|
138
|
+
privacy: { raw_prompts_omitted: true, raw_graph_nodes_omitted: true, source_code_omitted: true, tool_outputs_omitted: true, customer_data_omitted: true },
|
|
139
|
+
result: { status: 'availability_not_adoption', next_safe_action: 'treat graph/RAG as unadopted unless a gate requires or verifies retrieval before final claims' }
|
|
140
|
+
},
|
|
109
141
|
skillInstall: {
|
|
110
142
|
schema: 'pluribus.agent_skill_install_provenance_receipt.v1',
|
|
111
143
|
receipt_id: 'skill_install_openskills_demo',
|
|
@@ -160,6 +192,7 @@
|
|
|
160
192
|
if (receipt.schema === 'pluribus.context_attention_receipt.v1') return render(validateAttention(receipt))
|
|
161
193
|
if (receipt.schema === 'pluribus.claude_code_rule_attention_receipt.v1') return render(validateRuleAttention(receipt))
|
|
162
194
|
if (receipt.schema === 'pluribus.agent_skill_install_provenance_receipt.v1') return render(validateSkillInstall(receipt))
|
|
195
|
+
if (receipt.schema === 'pluribus.retrieval_adoption_receipt.v1') return render(validateRetrievalAdoption(receipt))
|
|
163
196
|
render({ ok:false, errors:[`unsupported schema: ${receipt.schema || '(missing)'}`], warnings:[] })
|
|
164
197
|
}
|
|
165
198
|
|
|
@@ -210,6 +243,27 @@
|
|
|
210
243
|
return { ok: errors.length === 0 && missing.length === 0, schema:r.schema, rules_count:rules.length, uncited_or_failing_rules:missing.map(rule => rule.rule_id), errors, warnings }
|
|
211
244
|
}
|
|
212
245
|
|
|
246
|
+
|
|
247
|
+
function validateRetrievalAdoption(r) {
|
|
248
|
+
const errors = [], warnings = []
|
|
249
|
+
const catalogTools = Array.isArray(r.availability_evidence?.tool_catalog_contains) ? r.availability_evidence.tool_catalog_contains : []
|
|
250
|
+
const retrievalCalls = Array.isArray(r.adoption_evidence?.retrieval_calls) ? r.adoption_evidence.retrieval_calls : []
|
|
251
|
+
const escapeCalls = Array.isArray(r.adoption_evidence?.native_escape_hatch_calls) ? r.adoption_evidence.native_escape_hatch_calls : []
|
|
252
|
+
const cited = Array.isArray(r.adoption_evidence?.cited_retrieved_context_ids) ? r.adoption_evidence.cited_retrieved_context_ids : []
|
|
253
|
+
if (!r.retrieval_surface?.kind) errors.push('retrieval_surface.kind is required')
|
|
254
|
+
if (r.retrieval_surface?.available !== true) errors.push('retrieval_surface.available must be true before adoption can be measured')
|
|
255
|
+
if (!catalogTools.length) errors.push('availability_evidence.tool_catalog_contains must list at least one retrieval tool')
|
|
256
|
+
if (r.availability_evidence?.raw_graph_omitted !== true) errors.push('availability_evidence.raw_graph_omitted must be true')
|
|
257
|
+
if (r.adoption_evidence?.raw_tool_outputs_omitted !== true) errors.push('adoption_evidence.raw_tool_outputs_omitted must be true')
|
|
258
|
+
for (const field of ['raw_prompts_omitted','raw_graph_nodes_omitted','source_code_omitted','tool_outputs_omitted','customer_data_omitted']) if (r.privacy?.[field] !== true) errors.push(`privacy.${field} must be true`)
|
|
259
|
+
const graphGroundedFindings = Number(r.adoption_evidence?.graph_grounded_findings || 0)
|
|
260
|
+
const usedRetrieval = retrievalCalls.length > 0 || cited.length > 0 || graphGroundedFindings > 0
|
|
261
|
+
if (r.boundary_checks?.retrieval_available_but_unused === true && usedRetrieval) errors.push('retrieval_available_but_unused cannot be true when retrieval calls/citations/findings exist')
|
|
262
|
+
if (!usedRetrieval && r.boundary_checks?.no_retrieval_claim_allowed !== true) errors.push('no_retrieval_claim_allowed must be true when retrieval was available but no adoption evidence exists')
|
|
263
|
+
if (!usedRetrieval && !escapeCalls.length) warnings.push('no retrieval adoption and no native escape-hatch calls recorded; receipt may be missing usage telemetry')
|
|
264
|
+
return { ok: errors.length === 0, schema:r.schema, retrieval_available:r.retrieval_surface?.available === true, retrieval_call_count:retrievalCalls.length, cited_context_count:cited.length, graph_grounded_findings:graphGroundedFindings, native_escape_hatch_calls:escapeCalls, status: usedRetrieval ? 'retrieval_adopted' : 'availability_not_adoption', errors, warnings }
|
|
265
|
+
}
|
|
266
|
+
|
|
213
267
|
function validateSkillInstall(r) {
|
|
214
268
|
const errors = [], warnings = []
|
|
215
269
|
const selected = Array.isArray(r.selected_skills) ? r.selected_skills : []
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Session preflight receipts
|
|
2
|
+
|
|
3
|
+
Cursor Forum users are asking for a "required first tool" or pre-tool hook so an agent cannot skip required project context before Grep, Shell, Write, or MCP calls. A behavioral rule is not enough: the first useful question is whether the session actually initialized its context before work began.
|
|
4
|
+
|
|
5
|
+
A **session preflight receipt** is a small, privacy-safe record produced before any side-effecting agent work. It does not log raw memory, prompts, secrets, or file contents. It records which required context sources were checked, which tool surfaces were available, and whether the agent is allowed to proceed.
|
|
6
|
+
|
|
7
|
+
Use this when rules like "read `MEMORY.md` first" or "call `session_init` before work" are important enough to audit.
|
|
8
|
+
|
|
9
|
+
## Minimal receipt
|
|
10
|
+
|
|
11
|
+
```json
|
|
12
|
+
{
|
|
13
|
+
"schema": "pluribus.session_preflight_receipt.v1",
|
|
14
|
+
"session_id": "local-2026-06-17T11:00Z",
|
|
15
|
+
"client": "cursor",
|
|
16
|
+
"required_first_step": {
|
|
17
|
+
"kind": "mcp_tool",
|
|
18
|
+
"name": "session_guard.session_init",
|
|
19
|
+
"enforcement": "behavioral_rule_only"
|
|
20
|
+
},
|
|
21
|
+
"required_context": [
|
|
22
|
+
{
|
|
23
|
+
"id": "project-memory",
|
|
24
|
+
"path": "MEMORY.md",
|
|
25
|
+
"status": "loaded",
|
|
26
|
+
"fingerprint": "sha256:example-non-secret-digest"
|
|
27
|
+
}
|
|
28
|
+
],
|
|
29
|
+
"tool_surface": {
|
|
30
|
+
"mcp_servers_seen": ["session-guard", "playwright"],
|
|
31
|
+
"side_effecting_tools_blocked_until_preflight": ["Shell", "Write"]
|
|
32
|
+
},
|
|
33
|
+
"decision": {
|
|
34
|
+
"allowed_to_start_work": true,
|
|
35
|
+
"mode": "read_then_patch",
|
|
36
|
+
"reason": "required context was checked before tool use"
|
|
37
|
+
}
|
|
38
|
+
}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## What this proves
|
|
42
|
+
|
|
43
|
+
- The agent had an explicit preflight gate before work.
|
|
44
|
+
- Required context sources were checked or clearly missing.
|
|
45
|
+
- Side-effecting tools were blocked, downgraded, or allowed for a reason.
|
|
46
|
+
- A reviewer can tell whether the run began from known project context or from blind rediscovery.
|
|
47
|
+
|
|
48
|
+
## What this does not prove
|
|
49
|
+
|
|
50
|
+
- It is not an authorization decision.
|
|
51
|
+
- It does not prove the model fully understood the context.
|
|
52
|
+
- It does not replace Cursor, Claude Code, or MCP-native enforcement.
|
|
53
|
+
- It does not require logging private memory content.
|
|
54
|
+
|
|
55
|
+
## Failure modes to record
|
|
56
|
+
|
|
57
|
+
```json
|
|
58
|
+
{
|
|
59
|
+
"schema": "pluribus.session_preflight_receipt.v1",
|
|
60
|
+
"decision": {
|
|
61
|
+
"allowed_to_start_work": false,
|
|
62
|
+
"mode": "read_only",
|
|
63
|
+
"reason": "required context missing or session_init skipped"
|
|
64
|
+
},
|
|
65
|
+
"failures": [
|
|
66
|
+
{
|
|
67
|
+
"kind": "required_context_missing",
|
|
68
|
+
"id": "project-memory",
|
|
69
|
+
"path": "MEMORY.md"
|
|
70
|
+
}
|
|
71
|
+
]
|
|
72
|
+
}
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
The key distinction: **a required first tool is the enforcement hook; the receipt is the evidence that the hook actually ran before work.**
|
|
76
|
+
|
|
77
|
+
See `examples/session-preflight-receipts/` for a copyable Cursor-style rule and JSON fixture.
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
# CLAUDE.md read receipt
|
|
2
|
+
|
|
3
|
+
A tiny, manual-first receipt for Claude Code sessions where `CLAUDE.md` is read at startup but trust erodes after a long session, compaction, `/clear`, or a topic switch.
|
|
4
|
+
|
|
5
|
+
Use it before asking the agent to edit again. The agent does **not** need to dump raw prompts or file contents. It should name the routing/index files it reloaded, the constraints it is carrying forward, and the relevant files it intentionally did not load.
|
|
6
|
+
|
|
7
|
+
## Prompt pattern
|
|
8
|
+
|
|
9
|
+
```text
|
|
10
|
+
Before continuing, give me a context read receipt:
|
|
11
|
+
- current task/topic
|
|
12
|
+
- session state: fresh, compacted, topic_switched, or resumed
|
|
13
|
+
- indexed files you reloaded and why
|
|
14
|
+
- 3-5 active constraints carried forward
|
|
15
|
+
- relevant files you did not load
|
|
16
|
+
- stale/historical notes you are not treating as current authority
|
|
17
|
+
- whether it is safe to edit now
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
If the agent cannot name what it reloaded, it probably has not grounded.
|
|
21
|
+
|
|
22
|
+
## Run the sample checker
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
cd examples/claude-md-read-receipts
|
|
26
|
+
node check-read-receipt.mjs --receipt sample-read-receipt.json
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
The bundled passing receipt models a topic switch from payments work to upload API work. It keeps `CLAUDE.md` as the router, reloads the topic doc and migration notes, lists active constraints, and marks the session safe to edit.
|
|
30
|
+
|
|
31
|
+
A stale receipt should fail:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
node check-read-receipt.mjs --receipt stale-read-receipt.json
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## Receipt shape
|
|
38
|
+
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"schema": "pluribus.claude_md_read_receipt.v1",
|
|
42
|
+
"session_state": "topic_switched",
|
|
43
|
+
"current_task": "Update upload API retry handling",
|
|
44
|
+
"reloaded_files": [
|
|
45
|
+
{ "path": "CLAUDE.md", "role": "router", "why": "Selects topic-specific docs" }
|
|
46
|
+
],
|
|
47
|
+
"active_constraints": [
|
|
48
|
+
"Do not log raw file contents",
|
|
49
|
+
"Preserve max 3 retries with exponential backoff",
|
|
50
|
+
"Treat v2.4.0 migration notes as current authority"
|
|
51
|
+
],
|
|
52
|
+
"not_loaded_files": [
|
|
53
|
+
{ "path": "docs/payments.md", "why": "Previous topic only" }
|
|
54
|
+
],
|
|
55
|
+
"safe_to_edit": true
|
|
56
|
+
}
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Why this exists
|
|
60
|
+
|
|
61
|
+
The market signal from Claude Code users is practical: people already maintain lean `CLAUDE.md` index files and topic memory docs, but they still have to ask whether the agent actually re-read the right material after a topic switch or compaction.
|
|
62
|
+
|
|
63
|
+
This receipt keeps the ritual lightweight and falsifiable:
|
|
64
|
+
|
|
65
|
+
- `CLAUDE.md` can stay small and act as a router.
|
|
66
|
+
- Topic docs can be reloaded only when relevant.
|
|
67
|
+
- Current authority is separated from stale/historical notes.
|
|
68
|
+
- `safe_to_edit=false` is allowed when the session cannot prove it is grounded.
|
|
69
|
+
|
|
70
|
+
Pair this with Pluribus' sync/audit workflow when file drift is the problem. Use this receipt when runtime grounding is the problem.
|