npm - pluribus-context - Versions diffs - 0.3.21 → 0.3.26 - Mend

pluribus-context 0.3.21 → 0.3.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (93) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,36 @@
 # Changelog
+## [Unreleased]
+All notable changes to Pluribus are documented here.
+- Added an executable subagent delegation receipt demo proving that large child command/tool output stayed isolated and only a bounded summary crossed back into the parent context.
+## 0.3.25 - 2026-05-23
+- Re-aligned npm package discovery metadata so the public package description preserves core search terms (`AI context`, `rules sync`, `Claude.md`, `Claude Code`, `Cursor`, and `Copilot instructions`) while retaining the new privacy-safe context receipts positioning.
+- Prepared the npm distribution catch-up from GitHub/local `0.3.24` to public `0.3.25`, so users reached through external PRs can install the current receipt demos and docs via `pluribus-context@latest`.
+## 0.3.24 - 2026-05-22
+- Added the `Context receipts for agent observability` guide, positioning privacy-safe receipts as evidence for what crossed an agent context boundary alongside OpenTelemetry spans and agent-run traces.
+- Added executable context receipt demos for context input loading, post-hoc session-log conversion, skill/plugin invocation, AGENTS.md overlays, shared-memory retrieval, memory consolidation, governance delete/forget, self-remediating brain/doctor runs, context compaction, MCP Tool Search, and GitHub MCP secret scanning.
+- Added OpenTelemetry-style trace fixtures that preserve hashes, counts, buckets, lifecycle state, selection/suppression decisions, and audit gaps without exporting raw prompts, secrets, tool outputs, memory bodies, transcripts, or private paths.
+- Extended discovery smoke tracking for external context-observability, MCP, memory, skills, and directory/list feedback channels so artifact-first distribution attempts remain measurable.
+- Updated README/docs links so users and reviewers can find context receipts from the package page and run the shipped examples from the npm tarball.
+## 0.3.23 - 2026-05-19
+- Extended `pluribus audit --fidelity-report` with `duplicateLoadEvidence` so reviewers can see the generated candidate, content hash, inferred selected load, and the explicit fact that runtime scanner roots/caches/plugin duplicates are not yet inspected.
+- Added `duplicate-load-selection-not-proven` warnings and semantic markers to avoid treating multi-root skill/rule discovery as safe without a selection/suppression receipt.
+- Updated the audit JSON schema and fidelity docs for duplicate skill/context load receipts inspired by Cursor/Claude duplicate-context reports.
+## 0.3.22 - 2026-05-19
+- Extended `pluribus audit --fidelity-report` with `loadEvidence` receipts so reviewers can see whether generated context is expected to enter through native file discovery or a generic agent fallback.
+- Added explicit runtime dedupe uncertainty via `load-dedupe-not-proven` warnings and `runtime-load-dedupe-not-proven` semantic markers, making native-vs-hook-vs-manual duplication an evidence question.
+- Updated the audit JSON schema, community review packet, README shortcut, and portability fidelity docs with `loadedBy`, `effectiveSource`, hook/session-start flags, resume behavior, and dedupe risk.
 ## 0.3.21 - 2026-05-19
 - Extended `pluribus audit --fidelity-report` with `effectiveContext` evidence so monorepo reviewers can see that built-in targets currently prove repo-root context only, not root→subpath inheritance or path isolation.
@@ -11,9 +42,6 @@
 - Added a 60-second native-vs-fallback smoke to the community review packet so directory/list reviewers can see Bob native `.bob/rules/*.md` differ from OpenClaw/AGENTS.md generic fallback.
 - Extended `npm run review:smoke` to execute that fidelity demo against the published npm package and assert `nativeDiscoverySurface`, `genericFallback`, and `manualActivationRequired` fields.
-## [Unreleased]
-All notable changes to Pluribus are documented here.
 ## 0.3.19 — Bob native discovery target

package/README.md CHANGED Viewed

@@ -14,7 +14,7 @@ It shows where instructions keep their semantics, where they are downgraded to a
 It is **not** a persistent memory layer, retrieval system, agent orchestrator, or agent-merging framework. Think `CLAUDE.md`, `.cursorrules`, `copilot-instructions.md`, `AGENTS.md` — one intentional context, multiple generated outputs.
-**Reviewer shortcut:** evaluating Pluribus for a list, newsletter, package roundup, or tool directory? Use the [Community Review Packet](docs/community-review-packet.md) for copy-paste directory submission fields, safety/removability notes, feedback links, and a disposable 60-second smoke test. If you only run one command, try `npx --yes pluribus-context@latest audit --json --fidelity-report` to see native discovery surfaces, generic fallbacks, manual activation requirements, and semantic differences.
+**Reviewer shortcut:** evaluating Pluribus for a list, newsletter, package roundup, or tool directory? Use the [Community Review Packet](docs/community-review-packet.md) for copy-paste directory submission fields, safety/removability notes, feedback links, and a disposable 60-second smoke test. If you only run one command, try `npx --yes pluribus-context@latest audit --json --fidelity-report` to see native discovery surfaces, generic fallbacks, load evidence, duplicate-load selection evidence, manual activation requirements, effective context scope, and semantic differences. For the newer agent-observability wedge, start with [context-budget receipts](docs/context-budget-receipts.md): privacy-safe evidence for what MCP schemas, skills, memory, subagents, CLI help, or summaries actually crossed an agent boundary.
 ---
@@ -154,7 +154,7 @@ npx --yes pluribus-context@latest sync --dry-run
 If the preview looks right, run `npx --yes pluribus-context@latest sync` to write the tool-specific files.
-For a fuller walkthrough, see the [Quickstart](docs/quickstart.md). To enforce generated context files in pull requests, use the [CI audit example](docs/ci-audit-example.md); to catch drift before commits leave your machine, use the [Pre-commit Audit Hook](docs/pre-commit-audit.md). If your repo already has `CLAUDE.md`, `.cursorrules`, Copilot instructions, or `AGENTS.md`, run a [Context Drift Audit](docs/context-drift-audit.md) first, try the intentionally drifted [audit example](examples/context-drift-audit/), then follow [Migrate Existing AI Context Files](docs/migrate-existing-context.md). If you switch between Cursor, Claude Code, Copilot, and terminal agents, try the [Cursor ↔ Claude Code context handoff guide](docs/cursor-claude-context-handoff.md) and its [example source file](examples/context-handoff/pluribus.md). If you run multiple AI sessions on the same project, try the [Coordination Contract guide](docs/coordination-contract.md) and its [example source file](examples/coordination-contract/pluribus.md) to keep event-log/scratchpad protocol rules aligned without turning Pluribus into an orchestrator. If you publish AI rules, skills, or instruction bundles as "portable", use the [Portability Fidelity Report](docs/portability-fidelity-report.md) and its [example source file](examples/portability-fidelity/pluribus.md) to make compatibility claims evidence-based instead of self-attested. Before committing shared or generated AI instructions, use the [Context File Review Checklist](docs/context-file-review.md). If you're deciding between Pluribus and a one-way rules converter, see [When to use Pluribus](docs/when-to-use-pluribus.md). If you are debugging "context drift" after compaction or long sessions, start with the [Context Drift Taxonomy](docs/context-drift-taxonomy.md) to separate file drift from runtime precedence drift. If you use MCP memory or knowledge-graph tools, try the [MCP memory handoff demo](docs/memory-mcp-handoff.md) to keep recall/store protocols aligned across AI coding tools without turning Pluribus into a memory server. If you are reviewing Pluribus for a list, newsletter, or tool directory, use the [Community Review Packet](docs/community-review-packet.md) for directory submission fields, a one-line description, safety notes, and a disposable 60-second smoke test. Maintainers can track package/repo discovery with the [Discovery Smoke Checks](docs/discovery-smoke.md).
+For a fuller walkthrough, see the [Quickstart](docs/quickstart.md). To enforce generated context files in pull requests, use the [CI audit example](docs/ci-audit-example.md); to catch drift before commits leave your machine, use the [Pre-commit Audit Hook](docs/pre-commit-audit.md). If your repo already has `CLAUDE.md`, `.cursorrules`, Copilot instructions, or `AGENTS.md`, run a [Context Drift Audit](docs/context-drift-audit.md) first, try the intentionally drifted [audit example](examples/context-drift-audit/), then follow [Migrate Existing AI Context Files](docs/migrate-existing-context.md). If you switch between Cursor, Claude Code, Copilot, and terminal agents, try the [Cursor ↔ Claude Code context handoff guide](docs/cursor-claude-context-handoff.md) and its [example source file](examples/context-handoff/pluribus.md). If you run multiple AI sessions on the same project, try the [Coordination Contract guide](docs/coordination-contract.md) and its [example source file](examples/coordination-contract/pluribus.md) to keep event-log/scratchpad protocol rules aligned without turning Pluribus into an orchestrator. If you evaluate code-search, MCP retrieval, RAG-over-notes, or agent memory tools, use the [Orchestration-layer Search Receipts](docs/orchestration-search-receipts.md) sketch to measure retrieved context from the harness layer without asking retrieval tools to inspect whole transcripts. If you are adding agent observability, traces, or OpenTelemetry-style events, start with [Context Receipts for Agent Observability](docs/context-receipts-for-agent-observability.md), then use the [Context Input Evidence](docs/context-input-evidence.md) sketch and its [executable demos](examples/context-input-evidence/) to separate source bytes, canonical text, delivered hashes, post-hoc session-log receipts, skill/plugin invocation receipts, shared-memory retrieval receipts, self-remediating brain/doctor receipts, and OpenTelemetry-style SpanEvents. If you publish AI rules, skills, or instruction bundles as "portable", use the [Portability Fidelity Report](docs/portability-fidelity-report.md) and its [example source file](examples/portability-fidelity/pluribus.md) to make compatibility claims evidence-based instead of self-attested. Before committing shared or generated AI instructions, use the [Context File Review Checklist](docs/context-file-review.md). If you're deciding between Pluribus and a one-way rules converter, see [When to use Pluribus](docs/when-to-use-pluribus.md). If you are debugging "context drift" after compaction or long sessions, start with the [Context Drift Taxonomy](docs/context-drift-taxonomy.md) to separate file drift from runtime precedence drift. If you use MCP memory or knowledge-graph tools, try the [MCP memory handoff demo](docs/memory-mcp-handoff.md) to keep recall/store protocols aligned across AI coding tools without turning Pluribus into a memory server. If you are reviewing Pluribus for a list, newsletter, or tool directory, use the [Community Review Packet](docs/community-review-packet.md) for directory submission fields, a one-line description, safety notes, and a disposable 60-second smoke test. Maintainers can track package/repo discovery with the [Discovery Smoke Checks](docs/discovery-smoke.md).
 ### Usage

package/docs/community-review-packet.md CHANGED Viewed

@@ -100,7 +100,7 @@ mkdir pluribus-fidelity && cd pluribus-fidelity
 npx --yes pluribus-context@latest init --name "Fidelity review" --description "Native vs fallback smoke" --tools bob,openclaw
 npx --yes pluribus-context@latest sync
 npx --yes pluribus-context@latest audit --json --fidelity-report --output fidelity.json
-node -e "const r=require('./fidelity.json'); console.log(r.fidelityReport.targets.map(t => ({ toolId: t.toolId, file: t.files[0], nativeDiscoverySurface: t.nativeDiscoverySurface, genericFallback: t.genericFallback, manualActivationRequired: t.manualActivationRequired, effectiveContextScope: t.effectiveContext?.scope })))"
+node -e "const r=require('./fidelity.json'); console.log(r.fidelityReport.targets.map(t => ({ toolId: t.toolId, file: t.files[0], nativeDiscoverySurface: t.nativeDiscoverySurface, genericFallback: t.genericFallback, manualActivationRequired: t.manualActivationRequired, effectiveContextScope: t.effectiveContext?.scope, loadedBy: t.loadEvidence?.loadedBy, dedupeRisk: t.loadEvidence?.dedupeRisk, duplicateRisk: t.duplicateLoadEvidence?.duplicateRisk, selectedLoad: t.duplicateLoadEvidence?.selectedLoad?.path })))"
 ```
 Expected result:
@@ -108,7 +108,9 @@ Expected result:
 - Bob writes `.bob/rules/pluribus.md` and reports `nativeDiscoverySurface: ".bob/rules/*.md"`, `genericFallback: false`, `manualActivationRequired: false`.
 - OpenClaw writes `AGENTS.md` and reports `nativeDiscoverySurface: "AGENTS.md"`, `genericFallback: true`, `manualActivationRequired: false`.
 - Both targets report `effectiveContext.scope: "repo-root"` and `pathScoped: false`; for monorepos this is a warning that subdirectory inheritance/isolation still needs a separate smoke.
-- This is the core Pluribus distinction for reviewers: generated file exists is not enough; the report should show whether the target uses native discovery or a generic fallback, and what effective context scope has actually been proven.
+- Both targets include `loadEvidence`: Bob is `loadedBy: "native-file-discovery"`; OpenClaw is `loadedBy: "generic-agent-file"`; both currently report `dedupeRisk: "unknown"` because Pluribus does not prove runtime deduplication across native files, hooks, generated imports, or manual injection.
+- Both targets include `duplicateLoadEvidence`: the Pluribus generated file is the only candidate Pluribus can name, with a `contentIdentity` hash and `selectedLoad`, but `duplicateRisk: "unknown"` because runtime scanner roots, caches, plugin directories, and sibling tool skill/rule folders are not inspected or suppressed by this smoke.
+- This is the core Pluribus distinction for reviewers: generated file exists is not enough; the report should show whether the target uses native discovery or a generic fallback, how the context is expected to be loaded, and what effective context scope has actually been proven.
 ## Useful links

package/docs/context-budget-receipts.md ADDED Viewed

@@ -0,0 +1,150 @@
+# Context-budget receipts
+Privacy-safe receipts for answering a narrow operational question:
+> What ate the agent's context before or after the task?
+This is different from generic token accounting. A context-budget receipt should prove which context surfaces were available, which ones crossed the boundary, which ones stayed deferred or suppressed, and how much budget remained — without exporting raw prompts, tool schemas, tool outputs, memory bodies, file paths, ticket text, secrets, or customer data.
+## When to use this receipt
+Use a context-budget receipt when a coding agent looks lazy, fails with `prompt is too long`, or returns a tiny summary after a subagent/tool-heavy step and you need to distinguish:
+- the user prompt was too large;
+- MCP/tool schemas were eagerly materialized;
+- a skill or rule listing consumed startup budget;
+- memory/search results hydrated too much context;
+- a manager subagent isolated heavy tools correctly;
+- a child subagent pasted raw tool output back into the parent; or
+- a CLI/MCP gateway used progressive disclosure correctly.
+## Minimum contract
+A useful receipt starts small:
+```json
+{
+  "event.name": "context.budget.evaluated",
+  "component": "subagent_boot | mcp_gateway | cli | mcp_manager | delegation",
+  "candidate_count": 566,
+  "loaded_count": 2,
+  "suppressed_count": 564,
+  "delivered_hash_count": 2,
+  "startup_token_bucket": "100k-200k",
+  "remaining_token_bucket": "0-10k",
+  "privacy.raw_prompt_included": false,
+  "privacy.raw_schema_included": false,
+  "privacy.raw_tool_output_included": false,
+  "audit_gap": "proves context boundary, not semantic quality"
+}
+```
+Keep exact counts when they are not sensitive. Bucket token counts and sizes when exact values could reveal private workload shape.
+## Subagent boot budget
+Subagents can fail before task #1 if they inherit every MCP schema, skill listing, rule, or memory index from the parent. The receipt should separate:
+- `available` — what could have been loaded;
+- `loaded` — what actually entered the subagent prompt/context;
+- `suppressed` or `deferred` — what stayed out;
+- `remaining` — coarse budget after bootstrap; and
+- `failure_or_headroom` — whether the subagent had room for tool results.
+Runnable fixture:
+```bash
+node examples/context-input-evidence/convert-subagent-context-budget-log.mjs
+```
+Public trace:
+- `examples/context-input-evidence/subagent-context-budget-otel-trace.json`
+## Delegation boundary
+A subagent can save parent context at boot and still lose the benefit if raw child output is pasted back into the parent. The receipt should prove:
+- delegation happened;
+- child output size stayed in the child/subagent store;
+- parent received a bounded summary, not raw output;
+- raw child output was not copied into the receipt; and
+- the audit gap remains explicit: the receipt proves the boundary, not summary correctness.
+Runnable fixture:
+```bash
+node examples/context-input-evidence/convert-subagent-delegation-log.mjs
+```
+Public trace:
+- `examples/context-input-evidence/subagent-delegation-otel-trace.json`
+## MCP manager isolation
+When a manager subagent owns hundreds of MCP tools, the parent should see a small request/summary surface, not the whole tool catalog. The receipt should prove:
+- parent full schemas were not loaded;
+- the manager booted with the tool catalog;
+- one or a small set of tools was selected;
+- unselected schemas stayed suppressed; and
+- only a bounded parent summary returned.
+Runnable fixture:
+```bash
+node examples/context-input-evidence/convert-claudekit-mcp-manager-log.mjs
+```
+Public trace:
+- `examples/context-input-evidence/claudekit-mcp-manager-otel-trace.json`
+## Progressive disclosure: MCP gateway or CLI
+If a gateway or CLI avoids context bloat by showing an index/prompt first and expanding one schema/help page later, the receipt should prove the disclosure boundary:
+- small agent prompt or meta-tool/index loaded at startup;
+- full schemas/specs were not loaded at startup;
+- one command/schema was hydrated on demand;
+- raw args/results stayed out of the receipt; and
+- selected/suppressed counts are visible enough for debugging.
+Runnable fixtures:
+```bash
+node examples/context-input-evidence/convert-agentgateway-progressive-disclosure-log.mjs
+node examples/context-input-evidence/convert-cli-progressive-disclosure-log.mjs
+node examples/context-input-evidence/convert-mcp-tool-search-log.mjs
+```
+Public traces:
+- `examples/context-input-evidence/agentgateway-progressive-disclosure-otel-trace.json`
+- `examples/context-input-evidence/cli-progressive-disclosure-otel-trace.json`
+- `examples/context-input-evidence/mcp-tool-search-otel-trace.json`
+## Privacy defaults
+For shareable receipts:
+- hash or HMAC stable identifiers; prefer HMAC for predictable IDs, paths, user IDs, and audit IDs;
+- fail closed or omit identifier hashes when the HMAC key is missing;
+- bucket large sizes and token counts;
+- never export raw schemas, raw memory, raw prompt, raw tool output, paths, tickets, emails, secrets, or customer-specific strings;
+- include explicit `raw_*_included=false` flags; and
+- include `audit_gap` so readers do not confuse boundary evidence with semantic correctness.
+## What to ask in a bug report
+Instead of “why is my subagent bad?”, ask for a receipt or debug JSON that can answer:
+1. How many tools/skills/rules/memory entries were available?
+2. How many were loaded into the parent?
+3. How many were loaded into the subagent?
+4. How many were suppressed/deferred?
+5. What token bucket remained before the first tool call?
+6. Did raw child output return to the parent, or only a bounded summary?
+That is the narrow wedge for Pluribus: context-budget evidence across agent boundaries, not another memory store or tool router.