npm - pluribus-context - Versions diffs - 0.3.40 → 0.3.42 - Mend

pluribus-context 0.3.40 → 0.3.42

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,14 @@
 All notable changes to Pluribus are documented here.
+## 0.3.42 - 2026-06-17
+- Added `pluribus demo context-sufficiency-trace`, a tiny npm-runnable pass/fail demo for checking whether a compressed context bundle preserved the task's required files before editing began.
+## 0.3.41 - 2026-06-10
+- Added npm Agent Skills discovery keywords (`agent-skill`, `skillpm`, and `agent-skills-registry`) so `pluribus-context` can be indexed by npm-backed skill package managers while continuing to ship the existing low-authority `skills/*/SKILL.md` recipes.
 ## 0.3.40 - 2026-06-09
 - Added `pluribus demo tool-surface-diff`, a tiny npm-runnable MCP dynamic-discovery receipt demo for proving discovered, activated, withheld, and blocked runtime tool-surface changes without logging raw schemas, prompts, or results.

package/README.md CHANGED Viewed

@@ -14,7 +14,7 @@ The original sync workflow is still useful: Pluribus can keep project instructio
 It is **not** a persistent memory layer, retrieval system, agent orchestrator, enterprise ContextOps platform, or agent-merging framework. Think evidence for context boundaries: `CLAUDE.md`, `.cursorrules`, `copilot-instructions.md`, `AGENTS.md`, MCP Tool Search, Agent Skills, RAG/code-search, pruning, and compaction — with privacy-safe receipts instead of raw content dumps.
-**Reviewer shortcut:** evaluating Pluribus for a list, newsletter, package roundup, or tool directory? Use the [Community Review Packet](docs/community-review-packet.md) for copy-paste directory submission fields, safety/removability notes, feedback links, and disposable 60-second smoke tests. If you only run one command for the cross-tool audit, try `npx --yes pluribus-context@latest audit --json --fidelity-report` to see native discovery surfaces, generic fallbacks, load evidence, duplicate-load selection evidence, manual activation requirements, effective context scope, and semantic differences. For the agent-observability wedge, start with [context-budget receipts](docs/context-budget-receipts.md): privacy-safe evidence for what MCP schemas, skills, memory, subagents, CLI help, retrieval chunks, pruning runs, or compaction summaries crossed an agent boundary. It now explicitly covers the "Tool Search fixed MCP bloat" objection: the receipt proves which lane stayed deferred, which tool was expanded, and whether schemas leaked through `messages`/bootstrap anyway. For a 60-second runtime-discovery proof, run `npx --yes pluribus-context@latest demo tool-surface-diff --json`; it validates a receipt for discovered → activated → withheld/blocked MCP tools without raw schemas/prompts/results. If you want the same idea as a copyable skill, use the [context-receipts Agent Skill recipe](skills/context-receipts/). npm `latest` is currently aligned with the GitHub release; the review packet also documents a GitHub-release smoke fallback for future release-lag windows.
+**Reviewer shortcut:** evaluating Pluribus for a list, newsletter, package roundup, or tool directory? Use the [Community Review Packet](docs/community-review-packet.md) for copy-paste directory submission fields, safety/removability notes, feedback links, and disposable 60-second smoke tests. If you are comparing plugins, Skills registries, config-sync tools, MCP setups, or Claude→Codex worker flows, start with the [Agent surface proof chain](docs/agent-surface-proof-chain.md) to separate install diffs, sync manifests, apply ledgers, surface state, selection traces, context-boundary spans, and handoff envelopes. If you only run one command for the cross-tool audit, try `npx --yes pluribus-context@latest audit --json --fidelity-report` to see native discovery surfaces, generic fallbacks, load evidence, duplicate-load selection evidence, manual activation requirements, effective context scope, and semantic differences. For the agent-observability wedge, start with [context-budget receipts](docs/context-budget-receipts.md): privacy-safe evidence for what MCP schemas, skills, memory, subagents, CLI help, retrieval chunks, pruning runs, or compaction summaries crossed an agent boundary. It now explicitly covers the "Tool Search fixed MCP bloat" objection: the receipt proves which lane stayed deferred, which tool was expanded, and whether schemas leaked through `messages`/bootstrap anyway. For a 60-second runtime-discovery proof, run `npx --yes pluribus-context@latest demo tool-surface-diff --json`; it validates a receipt for discovered → activated → withheld/blocked MCP tools without raw schemas/prompts/results. If you are coming from Claude Code, GraphRAG, or memory tooling where retrieval succeeds but the agent ignores it, try the [context attention receipt example](examples/context-attention-receipts/) to prove required context was delivered, acknowledged, and cited before edits. If MCP server catalogs are burning context before the task needs them, try the [task-scoped MCP config receipt demo](examples/task-scoped-mcp-config/) to generate a minimal `--mcp-config` plus a receipt for selected vs withheld servers. If a Claude Code Skill or paste-cleaning CLI claims big token savings, try the [semantic anchor preservation receipt demo](examples/semantic-anchor-receipts/) to prove the cleaned paste kept headings, API signatures, version notes, and security constraints. If a long Claude Code session, compaction, or topic switch makes `CLAUDE.md` feel stale, try the [CLAUDE.md read receipt example](examples/claude-md-read-receipts/) to prove which index/topic files were reloaded before the next edit. If Claude Code, Codex, or an API-backed agent starts timing out, drifting on tool choice, or producing bad patch formats while the status page is unclear, try the [provider degradation canary receipt example](examples/provider-degradation-canaries/) to decide whether writes should continue, fallback, or pause. If you want the same idea as a copyable skill, use the [context-receipts Agent Skill recipe](skills/context-receipts/). npm `latest` is currently aligned with the GitHub release; the review packet also documents a GitHub-release smoke fallback for future release-lag windows.
 ---
@@ -161,7 +161,7 @@ npx --yes pluribus-context@latest sync --dry-run
 If the preview looks right, run `npx --yes pluribus-context@latest sync` to write the tool-specific files.
-For a fuller walkthrough, see the [Quickstart](docs/quickstart.md). To enforce generated context files in pull requests, use the [CI audit example](docs/ci-audit-example.md); to catch drift before commits leave your machine, use the [Pre-commit Audit Hook](docs/pre-commit-audit.md). If your repo already has `CLAUDE.md`, `.cursorrules`, Copilot instructions, or `AGENTS.md`, run a [Context Drift Audit](docs/context-drift-audit.md) first, try the intentionally drifted [audit example](examples/context-drift-audit/), then follow [Migrate Existing AI Context Files](docs/migrate-existing-context.md). If you switch between Cursor, Claude Code, Copilot, and terminal agents, try the [Cursor ↔ Claude Code context handoff guide](docs/cursor-claude-context-handoff.md) and its [example source file](examples/context-handoff/pluribus.md). If you run multiple AI sessions on the same project, try the [Coordination Contract guide](docs/coordination-contract.md) and its [example source file](examples/coordination-contract/pluribus.md) to keep event-log/scratchpad protocol rules aligned without turning Pluribus into an orchestrator. If you evaluate code-search, MCP retrieval, RAG-over-notes, or agent memory tools, use the [Orchestration-layer Search Receipts](docs/orchestration-search-receipts.md) sketch to measure retrieved context from the harness layer without asking retrieval tools to inspect whole transcripts. If you are adding agent observability, traces, or OpenTelemetry-style events, start with [Context Receipts for Agent Observability](docs/context-receipts-for-agent-observability.md), then use the [Context Input Evidence](docs/context-input-evidence.md) sketch and its [executable demos](examples/context-input-evidence/) to separate source bytes, canonical text, delivered hashes, post-hoc session-log receipts, skill/plugin invocation receipts, shared-memory retrieval receipts, self-remediating brain/doctor receipts, and OpenTelemetry-style SpanEvents. If you publish AI rules, skills, or instruction bundles as "portable", use the [Portability Fidelity Report](docs/portability-fidelity-report.md) and its [example source file](examples/portability-fidelity/pluribus.md) to make compatibility claims evidence-based instead of self-attested. Before committing shared or generated AI instructions, use the [Context File Review Checklist](docs/context-file-review.md). If you're deciding between Pluribus and a one-way rules converter, see [When to use Pluribus](docs/when-to-use-pluribus.md). If you are debugging "context drift" after compaction or long sessions, start with the [Context Drift Taxonomy](docs/context-drift-taxonomy.md) to separate file drift from runtime precedence drift. If you use MCP memory or knowledge-graph tools, try the [MCP memory handoff demo](docs/memory-mcp-handoff.md) to keep recall/store protocols aligned across AI coding tools without turning Pluribus into a memory server. If your shared-memory or knowledge-graph setup lets agents write durable facts, use [Memory write policy receipts](docs/memory-write-policy-receipts.md) and the [copyable gate](examples/memory-write-policy/) to require proposed diffs, scope, lifecycle, visibility, approval, and privacy checks before one run can teach every harness. If hooks, local gateways, or agent firewalls block risky tool calls, use [Agent firewall denial/audit receipts](docs/agent-firewall-denial-audit.md) and the [copyable checker](examples/agent-firewall-denial-audit/) to split model-visible denial from private operator audit evidence. If you are turning Claude Code/OpenClaw/Cursor into role-based “AI employee” agents with Skills and memory folders, use the [Controlled learning queue](docs/controlled-learning-queue.md) and [copyable example](examples/controlled-learning-queue/) to let agents propose durable memory changes without silently rewriting shared ICP, pricing, compliance, or process assumptions. If `PreCompact` / `PostCompact` or `SessionStart(compact)` workflows decide whether an agent may continue after summarization, use [Compaction resume receipts](docs/compaction-resume-receipts.md) and the [copyable gate](examples/compaction-resume-receipts/) to prove what was summarized, which instruction sources reloaded, what state was lost/kept, and whether `safe_to_resume` is actually true. If an MCP server is healthy but tools are missing in Claude Code/Cursor/Codex, use the [MCP tool visibility receipts](docs/mcp-tool-visibility-receipts.md) checklist to separate launch, handshake, `tools/list`, client catalog, and first invocation failures. If a Claude Code/OpenClaw-style Skill states a hard rule but the run still violates it, use the [Skill policy receipts](docs/skill-policy-receipts.md) guide and [copyable Skill recipe](skills/skill-policy-receipts/) to turn target decisions, refusals, and post-write guards into privacy-safe evidence. If a Skill, plugin resource, MCP instruction, or custom-agent file exists but disappears in ACP/Zed/CLI/chat parity tests, use [Loaded-resource boundary receipts](docs/loaded-resource-boundary.md) and the [copyable checker](examples/loaded-resource-boundary/) to prove discovered, attached, injected, readable, and skipped-resource stages. If long-lived projects keep old specs/TODOs that still match grep but are no longer authoritative, use [Temporal context receipts](docs/temporal-context-receipts.md) and the [copyable current-state example](examples/temporal-context-receipts/) to separate current authority from historical citations before an agent writes code. If AI-generated pull requests are hard to review because diff size hides operational risk, use [AI PR review receipts](docs/ai-pr-review-receipts.md), the [copyable PR template](examples/ai-pr-review-receipts/), and the [GitHub Actions receipt gate](examples/ai-pr-review-receipts/.github/workflows/ai-pr-review-receipt.yml) to review by blast radius: schema/data contracts, async paths, rollout gates, side effects, and ambiguous boundaries. If you delegate work to Codex/Claude Code/Cursor/OpenClaw-style specialist subagents, use [Subagent role receipts](docs/subagent-role-receipts.md) and the [example role definitions](examples/subagent-role-receipts/) to prove the requested role, effective role, loaded instruction source, allowed/refused capabilities, stop point, and next safe action. If you run Claude Code-style dynamic workflows, ultracode, or local LLM gateway orchestration that spawns many agents, use [Dynamic workflow run receipts](docs/dynamic-workflow-run-receipts.md) and the [copyable workflow example](examples/dynamic-workflow-run-receipts/) to prove phases, per-agent roles/models, context loaded/skipped, tool grants, token spend buckets, per-agent fuses, heartbeat, stop reasons, and known gaps. If your workflow routes Explore/Propose/Spec/Design/Tasks/Apply/Verify across OpenCode, Claude Code, Cursor, Codex, or different models, use [Phase-boundary contracts](docs/phase-boundary-contracts.md) and the [copyable Apply→Verify gate](examples/phase-boundary-contract/) to prove allowed input context, output artifact, evidence required before the next phase, dropped context, and stop conditions. If you need CI/reviewers to decide whether an agent handoff can continue, must be reviewed, or should be rejected, use the [Review primitive gate](docs/review-primitive-gate.md), its [copyable gate example](examples/review-primitive-gate/), and the [Claude Code review hook bridge](examples/claude-code-review-hook/) to validate assignment boundaries, approved scope/access changes, required checks, privacy flags, and `complete / partial / unsafe-to-resume` state from CI or Claude Code `TaskCompleted` / `PostCompact` hooks. If Claude Projects, long chats, or compaction make the last clean artifact hard to recover, use [Canonical output receipts](docs/canonical-output-receipts.md) and the [copyable index example](examples/canonical-output-receipts/) to track stable IDs, paths, versions, exact grep phrases, decisions, rejected options, and next actions. If a setup script installs MCP servers, Skills, instruction files, hooks, or plugins across multiple agents, use [Install-plan receipts](docs/install-plan-receipts.md) and the [copyable example](examples/install-plan-receipts/) to prove planned writes, backups, network behavior, and `writes_started=false` before mutation. After a Skill installer runs, use [Skill install/load receipts](docs/skill-install-receipts.md) and the [copyable checker](examples/skill-install-receipts/) to prove source ref, target agents/scopes, discovery/load status, context-cost bucket, and `safe_to_start_session` without logging raw Skill bodies. If you are pruning Skill sprawl after real sessions, use [Skill use-rate receipts](docs/skill-use-rate-receipts.md) and the [copyable checker](examples/skill-use-rate-receipts/) to separate discovered/installed/attached from invoked/acted-on and catch "installed but unused" resources. If you supervise multiple Claude Code/Cursor/Codex/OpenClaw sessions in parallel, use the [Parallel session review ledger](docs/parallel-session-review-ledger.md) and [copyable checker](examples/parallel-session-review-ledger/) to decide which sessions are complete, partial, blocked, or unsafe to resume without trusting an agent summary. If you are reviewing Pluribus for a list, newsletter, or tool directory, use the [Community Review Packet](docs/community-review-packet.md) for directory submission fields, a one-line description, safety notes, and a disposable 60-second smoke test. Maintainers can track package/repo discovery with the [Discovery Smoke Checks](docs/discovery-smoke.md).
+For a fuller walkthrough, see the [Quickstart](docs/quickstart.md). To enforce generated context files in pull requests, use the [CI audit example](docs/ci-audit-example.md); to catch drift before commits leave your machine, use the [Pre-commit Audit Hook](docs/pre-commit-audit.md). If your repo already has `CLAUDE.md`, `.cursorrules`, Copilot instructions, or `AGENTS.md`, run a [Context Drift Audit](docs/context-drift-audit.md) first, try the intentionally drifted [audit example](examples/context-drift-audit/), then follow [Migrate Existing AI Context Files](docs/migrate-existing-context.md). If you switch between Cursor, Claude Code, Copilot, and terminal agents, try the [Cursor ↔ Claude Code context handoff guide](docs/cursor-claude-context-handoff.md), its [example source file](examples/context-handoff/pluribus.md), and the copyable handoff receipt for checking stale source files, diverged tool rules, wrong memories, dead commands, and not-loaded context before another agent writes code. If you run multiple AI sessions on the same project, try the [Coordination Contract guide](docs/coordination-contract.md) and its [example source file](examples/coordination-contract/pluribus.md) to keep event-log/scratchpad protocol rules aligned without turning Pluribus into an orchestrator. If you evaluate code-search, MCP retrieval, RAG-over-notes, or agent memory tools, use the [Orchestration-layer Search Receipts](docs/orchestration-search-receipts.md) sketch to measure retrieved context from the harness layer without asking retrieval tools to inspect whole transcripts. If you are adding agent observability, traces, or OpenTelemetry-style events, start with [Context Receipts for Agent Observability](docs/context-receipts-for-agent-observability.md), then use the [Context Input Evidence](docs/context-input-evidence.md) sketch and its [executable demos](examples/context-input-evidence/) to separate source bytes, canonical text, delivered hashes, post-hoc session-log receipts, skill/plugin invocation receipts, shared-memory retrieval receipts, self-remediating brain/doctor receipts, and OpenTelemetry-style SpanEvents. If you publish AI rules, skills, or instruction bundles as "portable", use the [Portability Fidelity Report](docs/portability-fidelity-report.md) and its [example source file](examples/portability-fidelity/pluribus.md) to make compatibility claims evidence-based instead of self-attested. Before committing shared or generated AI instructions, use the [Context File Review Checklist](docs/context-file-review.md). If you're deciding between Pluribus and a one-way rules converter, see [When to use Pluribus](docs/when-to-use-pluribus.md). If you are debugging "context drift" after compaction or long sessions, start with the [Context Drift Taxonomy](docs/context-drift-taxonomy.md) to separate file drift from runtime precedence drift, then use the [CLAUDE.md read receipt example](examples/claude-md-read-receipts/) when the practical question is whether a session actually reloaded the right index/topic files before editing. If you use MCP memory or knowledge-graph tools, try the [MCP memory handoff demo](docs/memory-mcp-handoff.md) to keep recall/store protocols aligned across AI coding tools without turning Pluribus into a memory server. If a provider/model may be silently degraded, use the [provider degradation canary receipt example](examples/provider-degradation-canaries/) to record transport health, capability canaries, fallback choice, and the write gate before side effects. If your shared-memory or knowledge-graph setup lets agents write durable facts, use [Memory write policy receipts](docs/memory-write-policy-receipts.md) and the [copyable gate](examples/memory-write-policy/) to require proposed diffs, scope, lifecycle, visibility, approval, and privacy checks before one run can teach every harness. If hooks, local gateways, or agent firewalls block risky tool calls, use [Agent firewall denial/audit receipts](docs/agent-firewall-denial-audit.md) and the [copyable checker](examples/agent-firewall-denial-audit/) to split model-visible denial from private operator audit evidence. If you are turning Claude Code/OpenClaw/Cursor into role-based “AI employee” agents with Skills and memory folders, use the [Controlled learning queue](docs/controlled-learning-queue.md) and [copyable example](examples/controlled-learning-queue/) to let agents propose durable memory changes without silently rewriting shared ICP, pricing, compliance, or process assumptions. If `PreCompact` / `PostCompact` or `SessionStart(compact)` workflows decide whether an agent may continue after summarization, use [Compaction resume receipts](docs/compaction-resume-receipts.md) and the [copyable gate](examples/compaction-resume-receipts/) to prove what was summarized, which instruction sources reloaded, what state was lost/kept, and whether `safe_to_resume` is actually true. If MCP tools consume context before a task needs them, use the [Task-scoped MCP config receipt demo](examples/task-scoped-mcp-config/) to generate a minimal `--mcp-config` and prove which servers were selected or withheld before startup. If an MCP server is healthy but tools are missing in Claude Code/Cursor/Codex, use the [MCP tool visibility receipts](docs/mcp-tool-visibility-receipts.md) checklist to separate launch, handshake, `tools/list`, client catalog, and first invocation failures. If a Claude Code/OpenClaw-style Skill states a hard rule but the run still violates it, use the [Skill policy receipts](docs/skill-policy-receipts.md) guide and [copyable Skill recipe](skills/skill-policy-receipts/) to turn target decisions, refusals, and post-write guards into privacy-safe evidence. If a Skill, plugin resource, MCP instruction, or custom-agent file exists but disappears in ACP/Zed/CLI/chat parity tests, use [Loaded-resource boundary receipts](docs/loaded-resource-boundary.md) and the [copyable checker](examples/loaded-resource-boundary/) to prove discovered, attached, injected, readable, and skipped-resource stages. If long-lived projects keep old specs/TODOs that still match grep but are no longer authoritative, use [Temporal context receipts](docs/temporal-context-receipts.md) and the [copyable current-state example](examples/temporal-context-receipts/) to separate current authority from historical citations before an agent writes code. If AI-generated pull requests are hard to review because diff size hides operational risk, use [AI PR review receipts](docs/ai-pr-review-receipts.md), the [copyable PR template](examples/ai-pr-review-receipts/), and the [GitHub Actions receipt gate](examples/ai-pr-review-receipts/.github/workflows/ai-pr-review-receipt.yml) to review by blast radius: schema/data contracts, async paths, rollout gates, side effects, and ambiguous boundaries. If you delegate work to Codex/Claude Code/Cursor/OpenClaw-style specialist subagents, use [Subagent role receipts](docs/subagent-role-receipts.md) and the [example role definitions](examples/subagent-role-receipts/) to prove the requested role, effective role, loaded instruction source, allowed/refused capabilities, stop point, and next safe action. If you run Claude Code-style dynamic workflows, ultracode, or local LLM gateway orchestration that spawns many agents, use [Dynamic workflow run receipts](docs/dynamic-workflow-run-receipts.md) and the [copyable workflow example](examples/dynamic-workflow-run-receipts/) to prove phases, per-agent roles/models, context loaded/skipped, tool grants, token spend buckets, per-agent fuses, heartbeat, stop reasons, and known gaps. If your workflow routes Explore/Propose/Spec/Design/Tasks/Apply/Verify across OpenCode, Claude Code, Cursor, Codex, or different models, use [Phase-boundary contracts](docs/phase-boundary-contracts.md) and the [copyable Apply→Verify gate](examples/phase-boundary-contract/) to prove allowed input context, output artifact, evidence required before the next phase, dropped context, and stop conditions. If you need CI/reviewers to decide whether an agent handoff can continue, must be reviewed, or should be rejected, use the [Review primitive gate](docs/review-primitive-gate.md), its [copyable gate example](examples/review-primitive-gate/), and the [Claude Code review hook bridge](examples/claude-code-review-hook/) to validate assignment boundaries, approved scope/access changes, required checks, privacy flags, and `complete / partial / unsafe-to-resume` state from CI or Claude Code `TaskCompleted` / `PostCompact` hooks. If Claude Projects, long chats, or compaction make the last clean artifact hard to recover, use [Canonical output receipts](docs/canonical-output-receipts.md) and the [copyable index example](examples/canonical-output-receipts/) to track stable IDs, paths, versions, exact grep phrases, decisions, rejected options, and next actions. If a setup script installs MCP servers, Skills, instruction files, hooks, or plugins across multiple agents, use [Install-plan receipts](docs/install-plan-receipts.md) and the [copyable example](examples/install-plan-receipts/) to prove planned writes, backups, network behavior, and `writes_started=false` before mutation. After a Skill installer runs, use [Skill install/load receipts](docs/skill-install-receipts.md) and the [copyable checker](examples/skill-install-receipts/) to prove source ref, target agents/scopes, discovery/load status, context-cost bucket, and `safe_to_start_session` without logging raw Skill bodies. If you are pruning Skill sprawl after real sessions, use [Skill use-rate receipts](docs/skill-use-rate-receipts.md) and the [copyable checker](examples/skill-use-rate-receipts/) to separate discovered/installed/attached from invoked/acted-on and catch "installed but unused" resources. If you supervise multiple Claude Code/Cursor/Codex/OpenClaw sessions in parallel, use the [Parallel session review ledger](docs/parallel-session-review-ledger.md) and [copyable checker](examples/parallel-session-review-ledger/) to decide which sessions are complete, partial, blocked, or unsafe to resume without trusting an agent summary. If you are reviewing Pluribus for a list, newsletter, or tool directory, use the [Community Review Packet](docs/community-review-packet.md) for directory submission fields, a one-line description, safety notes, and a disposable 60-second smoke test. Maintainers can track package/repo discovery with the [Discovery Smoke Checks](docs/discovery-smoke.md).
 ### Usage
@@ -407,6 +407,7 @@ If you've felt this pain, tell me about your setup. What tools do you use? How d
 - [OpenClaw Integration](docs/openclaw-integration.md) — how Pluribus generates `AGENTS.md` for OpenClaw
 - [Composable Contexts](docs/composable-contexts.md) — local/remote imports, merge behavior, and safety rules
 - [MCP Memory Handoff](docs/memory-mcp-handoff.md) — demo for keeping memory recall/store protocols aligned across tool-specific instruction files
+- [Task-scoped MCP Config Receipt](examples/task-scoped-mcp-config/) — generate a minimal `--mcp-config` plus selected/withheld server receipt for MCP context-bloat reviews
 - [MCP Tool Visibility Receipts](docs/mcp-tool-visibility-receipts.md) — checklist for debugging healthy MCP servers whose tools do not appear in the agent client catalog
 - [MCP Runtime Config Receipts](docs/mcp-runtime-config-receipts.md) — live-vs-template evidence for MCP permission/config drift review
 - [Remote Composable Context Imports](docs/remote-composable-context-imports.md) — design notes for lockfile/cache/auth hardening

package/bin/pluribus.js CHANGED Viewed

@@ -70,6 +70,7 @@ OPTIONS (demo)
   --receipt       Validate a custom demo receipt JSON file
   --input         Import a custom demo input file, such as rpc-messages.jsonl
   --json          Print machine-readable demo results
+  --pass          For context-sufficiency-trace, use the bundled passing trace
 EXAMPLES
   pluribus init
@@ -96,6 +97,7 @@ EXAMPLES
   pluribus demo mcp-telemetry-import --json
   pluribus demo tool-surface-diff
   pluribus demo tool-surface-diff --json
+  pluribus demo context-sufficiency-trace --json
 DOCS
   https://github.com/caioribeiroclw-pixel/pluribus
@@ -107,7 +109,7 @@ const COMMAND_FLAGS = {
   validate: new Set(['source', 'update-imports']),
   audit: new Set(['source', 'tools', 'update-imports', 'strict', 'ci', 'json', 'output', 'github-annotations', 'fidelity-report']),
   watch: new Set(['source', 'tools', 'update-imports', 'dry-run', 'once', 'debounce']),
-  demo: new Set(['receipt', 'input', 'json']),
+  demo: new Set(['receipt', 'input', 'json', 'pass']),
 }
 function getFlagNames(argv) {

package/docs/.nojekyll ADDED Viewed

File without changes

package/docs/.well-known/agent-skills/context-receipts/SKILL.md ADDED Viewed

@@ -0,0 +1,206 @@
+---
+name: context-receipts
+description: Emit privacy-safe receipts for context selection, deferral, hydration, compaction, pruning, delegation, usage attribution, and boundary handoffs.
+---
+# Context Receipts
+Use this skill when an agent workflow claims to save context by selecting, deferring, hydrating, summarizing, compacting, pruning, delegating, attributing usage, or isolating context.
+The job is not to log the private content. The job is to emit a small receipt that lets a reviewer answer:
+> what crossed the context boundary, what stayed out, and what audit gap remains?
+## Privacy defaults
+Never include raw prompts, raw tool schemas, raw tool arguments, raw tool results, raw skill bodies, memory bodies, secrets, customer names, or full transcripts in the receipt.
+Prefer:
+- stable ids or hashed ids;
+- counts and token/line buckets;
+- categorical reasons;
+- explicit booleans for raw content copied/not copied;
+- before/after context budget buckets;
+- an `audit_gap` field when the receipt proves routing but not semantic correctness.
+## 60-second Tool Search smoke
+For MCP Tool Search, lazy tool loading, or progressive disclosure, emit enough evidence to answer these seven checks:
+1. **Index-only startup:** did the session load a compact tool/server index instead of all full schemas?
+2. **Search/routing:** what hashed query/category or routing reason selected candidate tools?
+3. **Hydration:** which full tool definition was loaded, why, and how many definitions stayed suppressed?
+4. **Call:** which server/tool id was invoked, with argument/result redaction status and success/error status?
+5. **Boundary:** if a manager subagent or child agent was used, did raw child output return to the parent?
+6. **Budget:** what were the startup and post-hydration context-token buckets?
+7. **Audit gap:** what is not proven, such as whether the selected tool was semantically optimal?
+Minimal JSONL event names:
+```jsonl
+{"event":"mcp.tool_index.loaded","loaded_server_count":12,"loaded_tool_index_count":84,"full_schema_count":0,"suppressed_tool_count":84,"raw_schema_copied":false,"startup_token_bucket":"lt_1k"}
+{"event":"mcp.tool_search.performed","query_hash":"sha256:...","query_category":"repo_search","candidate_tool_count":5,"selected_tool_id":"github.search_code","raw_query_copied":false}
+{"event":"mcp.tool_definition.loaded","tool_id":"github.search_code","hydrate_reason":"selected_after_tool_search","suppressed_tool_count":83,"definition_token_bucket":"1k_2k","raw_schema_copied":false}
+{"event":"mcp.tool_call.completed","tool_id":"github.search_code","args_hash":"sha256:...","result_token_bucket":"2k_4k","raw_args_copied":false,"raw_result_copied":false,"status":"ok"}
+```
+## Skill / prompt context smoke
+For skills, rules, AGENTS.md overlays, or instruction files, answer:
+- which index/listing entered the session;
+- which full skill/rule/instruction body was selected;
+- which candidates were suppressed and why;
+- whether the body was loaded at session start, after a search, or after an explicit command;
+- source hash, delivered hash, and canonical form when available;
+- whether the skill/instruction text was copied into the receipt.
+Minimal event names:
+- `context.skill.registry.index.loaded`
+- `context.skill.registry.skill.read`
+- `context.skill.registry.skill.injected`
+- `context.input.loaded`
+- `context.input.candidate_suppressed`
+## Per-agent MCP injection smoke
+For role-specific subagents or per-agent MCP configs, prove the policy boundary before debugging model quality:
+- which subagent role/session requested tools;
+- which MCP servers were available to that role;
+- which servers were explicitly excluded before boot;
+- whether startup loaded full schemas or only a compact index;
+- how many tool definitions stayed deferred/suppressed; and
+- the startup token bucket after policy was applied.
+Minimal JSONL event names:
+```jsonl
+{"event":"subagent.mcp_policy.applied","subagent_role":"testing","available_server_count":2,"available_servers_hash":"sha256:...","excluded_server_count":5,"excluded_servers_hash":"sha256:...","policy_source":"role_config","raw_server_names_copied":false}
+{"event":"subagent.context_boot.evaluated","subagent_role":"testing","loaded_tool_definition_count":0,"deferred_tool_definition_count":48,"startup_token_bucket":"50k_75k","raw_schema_copied":false,"audit_gap":"proves injection boundary, not tool relevance"}
+```
+## ToolSearch propagation smoke
+For subagents that should inherit MCP through `ToolSearch`, distinguish policy, declaration, and runtime filtering:
+- did the parent/orchestrator intend to expose MCP or exclude it for this subagent?
+- was the subagent spawned immediately or after parent tool calls/orchestration work?
+- was the `tools:` declaration wildcard, explicit include, or exclusion style?
+- was `ToolSearch` declared and was it actually exposed in the subagent tool surface?
+- did MCP servers/tool definitions stay deferred, or did the channel collapse to zero?
+- was the agent registry loaded at session boot, making newly added agent files invisible until restart?
+Minimal JSONL event names:
+```jsonl
+{"event":"subagent.toolsearch.propagation.evaluated","spawn_path":"Task","tools_declaration_shape":"enumerated_include","toolsearch_declared":false,"toolsearch_exposed":false,"mcp_servers_available_bucket":"0","deferred_tool_definitions_bucket":"0","filtered_by":"frontmatter_tools_policy_or_runtime_filter","raw_tool_schemas_copied":false}
+{"event":"subagent.toolsearch.matrix.completed","tested_axis":"tools_frontmatter_shape","audit_gap":"proves ToolSearch exposure, not semantic tool relevance or runtime call success"}
+```
+## Retrieval / code-search smoke
+For semantic code search, repo RAG, or MCP tools such as Claude Context, separate "search returned" from "agent context loaded":
+- which index snapshot/version was used, without raw local codebase paths;
+- what query/category/filter identity selected the candidates, without raw query text;
+- which result ids/chunk hashes were returned, with rank, score bucket, stale flag, duplicate marker, path hash/extension, and range bucket;
+- which returned chunks were actually loaded into the agent context;
+- which chunks were suppressed as duplicate, stale, clipped, policy-blocked, or over budget;
+- whether raw code, raw prompts, raw paths, customer names, URLs, secrets, and ticket text stayed out of the receipt;
+- the audit gap: this proves retrieval/loading boundaries, not semantic answer quality.
+Minimal JSONL event names:
+```jsonl
+{"event":"code.index.snapshot.used","snapshot_id_hash":"sha256:...","codebase_path_hash":"sha256:...","indexed_chunk_count_bucket":"over_1k","raw_codebase_path_copied":false}
+{"event":"code.search.performed","query_hash":"sha256:...","query_category":"auth_debug","candidate_count_bucket":"over_1k","raw_query_copied":false}
+{"event":"code.search.result.returned","rank":1,"chunk_id_hash":"sha256:...","chunk_text_hash":"sha256:...","path_hash":"sha256:...","score_bucket":"high","stale":false,"raw_code_copied":false}
+{"event":"context.input.loaded","kind":"retrieved_code_chunks","loaded_chunk_count":3,"suppressed_chunk_count":2,"suppression_reasons":["duplicate","stale_snapshot_chunk"],"raw_code_copied":false}
+```
+## Usage attribution smoke
+For `/usage`, `/context`, `/doctor`, or other context-budget breakdowns, map each displayed category to evidence that can be reviewed without exposing private content:
+- what measurement window was used;
+- which categories were attributed, such as skills, subagents, plugins, MCP servers, rules, memory, or project files;
+- which components were loaded, deferred, hydrated, suppressed, pruned, or rolled back;
+- before/after or current token/cost buckets by category;
+- whether raw skill bodies, prompts, MCP schemas, tool outputs, and file paths were excluded;
+- the remaining audit gap, such as not proving semantic usefulness of a high-cost component.
+Minimal JSONL event names:
+```jsonl
+{"event":"context.usage.window.measured","window":"current_session","total_token_bucket":"100k_150k","raw_prompts_copied":false}
+{"event":"context.usage.category.attributed","category":"mcp_server","component_hash":"sha256:...","loaded_token_bucket":"10k_25k","deferred_definition_count":42,"hydrated_definition_count":3,"raw_schema_copied":false}
+{"event":"context.usage.breakdown.completed","categories":["skills","subagents","plugins","mcp_server"],"audit_gap":"proves attribution buckets, not whether each component was necessary"}
+```
+## Pruning / compaction smoke
+For context-cleaning, pruning, compaction, or doctor/guard tools, answer:
+- what prescription/trigger started the run;
+- which strategies changed context and which candidates were protected;
+- before/after token and byte buckets;
+- whether summaries, behavioral digests, team messages, and backups were preserved;
+- whether private transcript text, raw tool output, file paths, secrets, and customer text were excluded from the receipt;
+- the remaining audit gap, such as not proving semantic quality of the pruned text.
+Minimal JSONL event names:
+```jsonl
+{"event":"context.prune.started","prescription":"balanced","trigger":"manual_dry_run","before_token_bucket":"150k_200k","raw_transcript_copied":false}
+{"event":"context.prune.strategy.evaluated","strategy":"tool-output-trim","candidate_bucket":"10_25","changed_bucket":"5_10","protected_bucket":"1_5","raw_tool_output_copied":false}
+{"event":"context.prune.completed","after_token_bucket":"75k_100k","backup_verified":true,"protected_summary_count":2,"raw_text_copied":false,"audit_gap":"proves pruning/protection counts, not semantic disposability"}
+```
+For failed compaction, also prove transaction safety:
+- did the summary call succeed, fail, or timeout;
+- was a candidate summary validated before any swap;
+- did the harness commit a context swap or preserve the original context;
+- were deferred-tool registries and system-reminder queues restored on rollback;
+- did stale system reminders/tool results replay as fresh state;
+- was post-token metadata recorded as success even though summary failed.
+Minimal JSONL event names:
+```jsonl
+{"event":"context.compaction.summary.attempted","summary_call_status":"failed_rate_limited","candidate_summary_available":false,"raw_error_copied":false}
+{"event":"context.compaction.rollback.completed","swap_committed":false,"original_context_preserved":true,"deferred_tool_registry_restored":true,"system_reminder_queue_restored":true,"replayed_system_reminder_count":0}
+{"event":"context.compaction.transaction.completed","status":"rolled_back","authoritative_state":"pre_compaction_context","post_tokens_recorded_as_success":false,"raw_context_copied":false}
+```
+## Subagent / manager boundary smoke
+For subagents, manager agents, or child workers, answer:
+- what task was delegated, by category and hashed objective;
+- what large output was captured by the child, as line/token buckets;
+- what bounded summary returned to the parent;
+- whether raw child output, tool results, or MCP schemas entered the parent context;
+- the remaining audit gap.
+Minimal event names:
+- `subagent.delegation.requested`
+- `subagent.tool_output.captured`
+- `subagent.summary.returned`
+- `parent.context_budget.evaluated`
+## Good receipt test
+A receipt is useful if a maintainer can debug one of these failures without seeing private content:
+- the agent never found the right tool/skill;
+- the full definition loaded too early;
+- too many definitions stayed in context;
+- a child/subagent saved no budget because raw output returned to the parent;
+- compaction/pruning happened but no one can prove what was changed, protected, backed up, summarized, or dropped.
+A receipt is not enough if it only says “Tool Search enabled” or “used subagent”. It must prove the boundary behavior.

package/docs/.well-known/agent-skills/index.json ADDED Viewed

@@ -0,0 +1,19 @@
+{
+  "$schema": "https://schemas.agentskills.io/discovery/0.2.0/schema.json",
+  "skills": [
+    {
+      "name": "context-receipts",
+      "type": "skill-md",
+      "description": "Emit privacy-safe receipts for context selection, deferral, hydration, compaction, pruning, delegation, usage attribution, and boundary handoffs.",
+      "url": "context-receipts/SKILL.md",
+      "digest": "sha256:6d268a5ceac4afa87308ff4c79f01483e2785a8e9bb54e5d3147477e0bf724a1"
+    },
+    {
+      "name": "skill-policy-receipts",
+      "type": "skill-md",
+      "description": "Use when a task must obey a hard project policy, such as \"do not generate tests for internal services\", \"do not call production APIs\", or \"do not edit generated files\". Emits a privacy-safe receipt before writes and after guard checks.",
+      "url": "skill-policy-receipts/SKILL.md",
+      "digest": "sha256:633b4da097b56da8805476a69d0498602979b7f8c03c57996f886bb9e56ccca8"
+    }
+  ]
+}

package/docs/.well-known/agent-skills/skill-policy-receipts/SKILL.md ADDED Viewed

@@ -0,0 +1,77 @@
+---
+name: skill-policy-receipts
+description: Use when a task must obey a hard project policy, such as "do not generate tests for internal services", "do not call production APIs", or "do not edit generated files". Emits a privacy-safe receipt before writes and after guard checks.
+---
+# Skill Policy Receipts
+This Skill turns natural-language guardrails into an inspectable policy receipt.
+## Preflight: decide before writing
+Before creating or editing files:
+1. List intended targets using coarse paths or globs.
+2. For each target, decide `allowed` or `refused`.
+3. Give a short reason.
+4. If any target is refused, stop before writing.
+5. Emit a receipt with `write_started=false` and `stopped_at="policy_refused"`.
+Receipt shape:
+```json
+{
+  "receipt_type": "skill.policy.v1",
+  "skill": "skill-policy-receipts",
+  "policy_scope": "<short policy name>",
+  "targets": [
+    {
+      "target": "<coarse path or glob>",
+      "decision": "allowed|refused",
+      "reason": "<short reason>"
+    }
+  ],
+  "write_started": false,
+  "post_write_guard": "not_run",
+  "stopped_at": "policy_refused|all_targets_allowed"
+}
+```
+Do not include raw prompts, code, secrets, customer data, stack traces, or full tool output.
+## Write only after all targets are allowed
+If every target is allowed:
+1. Emit or state `stopped_at="all_targets_allowed"`.
+2. Perform the write.
+3. Run the configured post-write guard.
+4. Emit whether the guard passed or failed.
+Post-write receipt shape:
+```json
+{
+  "receipt_type": "skill.policy.v1",
+  "skill": "skill-policy-receipts",
+  "policy_scope": "<short policy name>",
+  "write_started": true,
+  "post_write_guard": "passed|failed|not_configured",
+  "stopped_at": "guard_passed|guard_failed"
+}
+```
+## Example policy: no internal-service unit tests
+Policy:
+> Do not generate unit tests for internal services. If the requested test imports `internal/`, `@/internal`, or a known private service module, refuse before writing and explain the safer target.
+Example guard:
+```bash
+grep -R "from ['\"]\.\./\.\./internal\|from ['\"]@/internal\|require(['\"]@/internal" \
+  -- '*test.*' '*spec.*'
+```
+If the grep finds a match in generated tests, stop and report `post_write_guard="failed"`.

package/docs/agent-surface-proof-chain.md ADDED Viewed

@@ -0,0 +1,176 @@
+# Agent surface proof chain
+Agent setup is becoming a bundle, not a file: Skills, hooks, MCP servers, subagents, slash commands, permissions, profiles, plugins, and cross-model workers all get installed or synced together.
+A single “receipt” is too vague for that surface. Use the smallest proof object for the boundary you are crossing, and do not let one green check imply the next boundary worked.
+## Quick model
+```text
+registry publishes
+  → installer plans writes
+  → sync applies writes
+  → host exposes surface
+  → task makes tools/skills eligible
+  → runtime calls or activates them
+  → workers hand results back
+  → reviewer/CI promotes state
+```
+Each arrow can fail independently.
+## Boundary-specific proof objects
+| Boundary | Use this proof object | Proves | Does **not** prove |
+| --- | --- | --- | --- |
+| Setup script or plugin is about to write files | **Install diff** | Planned files, permissions, hooks, MCP servers, Skills, commands, backups, network/env access, and `writes_started=false` before mutation | The host later loaded or used any installed surface |
+| Registry sync says it succeeded | **Post-sync manifest** | Published asset version, target agents/scopes, written paths, content hashes, skipped targets, errors, and restart requirements | Runtime discovery or activation |
+| Continuous config sync ran with `--apply` | **Post-apply ledger** | What was actually written, skipped, backed up, failed, or sent to manual review after apply | That Claude/Codex/Cursor followed the config |
+| Host starts after install/sync | **Surface state** | What is visible/attached/discovered vs skipped/withheld: Skills, hooks, MCP tools, agents, slash commands, instruction files | That a specific task selected the right surface |
+| Runtime task decides what to use | **Selection trace** | Available → eligible → called → enforced for tools/Skills/MCP, with privacy-safe reasons | That the output is correct |
+| Model/provider may be silently degraded | **Degradation decision record** | Transport health, app-critical canaries, fallback choice, degradation confidence, and whether writes should continue | That the provider is globally healthy or that future turns will remain stable |
+| Long session, compaction, or topic switch resumes work | **Read receipt / safe-to-edit gate** | Which index/topic files or summaries reloaded, active constraints, stale notes rejected, and whether editing is safe | That future turns will keep following the context |
+| Debugger shows a chain of LLM calls | **Context-boundary span** | Which context inputs crossed into each node, hashes/paths by default, withheld inputs, replay reason, downstream invalidations | A raw prompt dump, secret-safe by itself, or full correctness |
+| Claude delegates to Codex/Gemini/subagents | **Handoff envelope** | Task, parent-plan hash, allowed files/commands, passed context sources, output schema, timeout, and insufficient-context path | That worker output is trusted project state |
+| Worker results are merged back | **Merge-back evidence** | What changed, evidence used, assumptions, invalidated downstream outputs, and reviewer/CI promotion decision | That the original worker had complete context |
+## Minimal fields by proof object
+### Install diff
+```json
+{
+  "proof_type": "install_diff",
+  "installer": "claude-code-setup",
+  "targets": ["claude-code", "codex"],
+  "planned_writes": [
+    {"path": ".claude/hooks/pretooluse.json", "kind": "hook", "backup": true},
+    {"path": ".mcp.json", "kind": "mcp_config", "backup": true}
+  ],
+  "env_or_network_access": ["ANTHROPIC_API_KEY:required-not-recorded"],
+  "writes_started": false,
+  "review_required": true
+}
+```
+### Post-sync manifest
+```json
+{
+  "proof_type": "post_sync_manifest",
+  "run_id": "skills-sync-2026-06-14T21:00Z",
+  "source": "team-skills-registry",
+  "targets": [
+    {
+      "agent": "claude-code",
+      "scope": "project",
+      "skills_dir": ".claude/skills",
+      "written": [{"name": "review-pr", "version": "1.4.2", "sha256": "..."}],
+      "skipped": []
+    }
+  ],
+  "restart_required": true
+}
+```
+### Post-apply ledger
+```json
+{
+  "proof_type": "post_apply_ledger",
+  "run_id": "config-sync-123",
+  "plan_hash": "sha256:...",
+  "writes_started": true,
+  "backup_root": ".agent-sync/backups/2026-06-14T21-00Z",
+  "operations": [
+    {
+      "path": "AGENTS.md",
+      "status": "written",
+      "before_hash": "sha256:old",
+      "after_hash": "sha256:new",
+      "backup_path": ".agent-sync/backups/.../AGENTS.md"
+    },
+    {
+      "path": ".codex/config.toml",
+      "status": "manual_review",
+      "reason": "permission profile changed"
+    }
+  ]
+}
+```
+### Selection trace
+```json
+{
+  "proof_type": "selection_trace",
+  "turn_id": "turn-42",
+  "loaded_instructions": ["CLAUDE.md", ".claude/skills/memory/SKILL.md"],
+  "mcp_tools_visible": ["memory.search", "memory.write"],
+  "task_intent": "recall prior decision before editing auth flow",
+  "expected_tools": ["memory.search"],
+  "eligible_tools": ["memory.search"],
+  "called_tools": ["memory.search"],
+  "enforced_by_hook": true
+}
+```
+### Degradation decision record
+```json
+{
+  "proof_type": "degradation_decision_record",
+  "run_id": "agent-run-2026-06-15T20:02Z",
+  "provider": "anthropic",
+  "model": "claude-sonnet-4",
+  "region": "us-east-1",
+  "prompt_template_hash": "sha256:...",
+  "canary_suite_version": "coding-agent-smoke-2026-06-15",
+  "transport": {
+    "ttft_p95_ms": 1400,
+    "total_latency_p95_ms": 9200,
+    "timeout_rate": 0.01,
+    "error_rate": 0
+  },
+  "capability_canaries": [
+    {"name": "json_schema", "status": "pass", "severity": "write_blocking"},
+    {"name": "tool_choice", "status": "pass", "severity": "write_blocking"},
+    {"name": "patch_format", "status": "pass", "severity": "write_blocking"}
+  ],
+  "confidence": "healthy",
+  "write_gate": "continue"
+}
+```
+### Handoff envelope
+```json
+{
+  "proof_type": "handoff_envelope",
+  "from": "opus-supervisor",
+  "to": "codex-worker-2",
+  "task": "compare parser failures in imports.test.js",
+  "parent_plan_hash": "sha256:...",
+  "allowed_files": ["src/utils/imports.js", "test/imports.test.js"],
+  "allowed_commands": ["npm test -- imports"],
+  "context_sources_passed": ["spec/context-format.md#remote-imports"],
+  "expected_output_schema": "worker_result_v1",
+  "stop_condition": "one patch candidate or explicit insufficient_context"
+}
+```
+## Practical rules
+1. **Do not promote intent as outcome.** A dry-run plan is not an apply ledger.
+2. **Do not promote visibility as use.** A visible MCP tool or Skill is not an activated/called one.
+3. **Do not promote latency as reliability.** A low-latency provider call can still fail app-critical canaries, and a slow call may still be safe for read-only work.
+4. **Do not promote worker output as project truth.** Merge-back needs evidence and invalidation notes.
+5. **Keep receipts privacy-safe by default.** Prefer paths, hashes, names, versions, statuses, and reasons; expand raw bodies only under explicit local review.
+6. **Name skipped and withheld context.** What did not load is often the failure.
+7. **Use the product’s own vocabulary.** Say install diff for installers, ledger for sync apply, span for debuggers, envelope for delegation, degradation decision for provider/model health, and read receipt for re-grounding.
+## When Pluribus fits
+Use Pluribus when you need privacy-safe evidence around agent context boundaries: generated instruction files, Skills, MCP tools, memory/RAG results, compaction, pruning, provider/model degradation decisions, plugin setup, or cross-tool handoffs.
+Do not use Pluribus as a registry, memory server, agent orchestrator, or replacement for Claude/Codex/Cursor runtime diagnostics. It is the evidence layer around those systems.