npm - pluribus-context - Versions diffs - 0.3.22 → 0.3.27 - Mend

pluribus-context 0.3.22 → 0.3.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

package/docs/orchestration-search-receipts.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Orchestration-layer search receipts
+Search tools can report strong local savings while the whole agent task still gets more expensive: extra turns, repeated queries, insufficient snippets, or fallback full-file reads can erase the gain.
+The receipt should live at the **orchestration / harness layer**, not inside a retrieval tool. The harness can see the task boundary, tool calls, follow-up reads, and whether the agent looped. A retrieval tool should not need to inspect the user's full task transcript or other tool traffic just to prove its value.
+## Audience
+Use this when evaluating code-search, MCP retrieval, RAG-over-notes, or agent memory tools inside Claude Code, Cursor, Codex, Copilot, Windsurf, OpenClaw, Continue, or custom harnesses.
+## What to prove
+A useful search receipt should answer:
+- which task or session this evidence belongs to;
+- which search/retrieval calls happened;
+- how much context was returned to the agent;
+- which files or notes were pointed to;
+- whether the agent needed follow-up full reads;
+- whether insufficient snippets caused repeated queries or extra loops;
+- what data was intentionally omitted for privacy.
+## Minimal JSON shape
+```json
+{
+  "schema": "pluribus.orchestrationSearchReceipt.v0",
+  "taskId": "optional-local-task-id",
+  "createdAt": "2026-05-20T15:00:00Z",
+  "orchestrator": {
+    "name": "local-harness",
+    "version": "0.1.0"
+  },
+  "privacy": {
+    "storesPrompts": false,
+    "storesResultText": false,
+    "redaction": "paths-and-counts-only by default"
+  },
+  "retrievalTool": {
+    "name": "example-search-tool",
+    "mode": "mcp-or-cli"
+  },
+  "queries": [
+    {
+      "queryId": "q1",
+      "queryHash": "sha256:...",
+      "mode": "hybrid",
+      "topK": 5,
+      "returnedChunks": 5,
+      "returnedChars": 4479,
+      "estimatedTokensReturned": 1120,
+      "resultFiles": ["src/auth.ts", "src/session.ts"],
+      "candidateFileChars": 85000,
+      "followUpFullReads": ["src/auth.ts"],
+      "fallbackRecommended": false,
+      "resultSufficiency": "partial"
+    }
+  ],
+  "totals": {
+    "searchCalls": 3,
+    "returnedChars": 12000,
+    "estimatedTokensReturned": 3000,
+    "estimatedTokensAvoidedVsFullFiles": 42000,
+    "followUpFullReadCount": 1,
+    "repeatedQueryCount": 1
+  },
+  "limits": [
+    "Does not prove end-to-end task success by itself.",
+    "Pair with a transcript or eval harness when comparing agent-level quality/cost."
+  ]
+}
+```
+## Privacy defaults
+Default to counts, hashes, paths, and tool-call metadata. Do **not** store prompts, raw result text, repository secrets, memory contents, or full transcripts unless the user explicitly opts in.
+Good defaults:
+- hash query text instead of storing it when the query may contain private details;
+- keep paths relative and redact ignored/private directories;
+- store byte/token counts instead of snippet bodies;
+- store fallback/full-read counts, not file contents;
+- make receipt output a local artifact unless the user chooses to publish it.
+## Why this belongs above the search tool
+A search tool can usually know `returnedChars`, `topK`, and result paths. It usually cannot know whether the agent then:
+- asked the same question again;
+- read entire files anyway;
+- solved the task;
+- paid more in turns than it saved in retrieval tokens;
+- mixed results with other memory/context sources.
+That evidence is visible to the orchestrator, agent harness, CLI wrapper, CI eval, or transcript analyzer. Keeping the receipt there avoids asking retrieval tools to become invasive observers of the whole workflow.
+## How this relates to Pluribus
+Pluribus already emits context-file fidelity/load evidence for upfront context files. This document extends the same principle to **retrieved context**: the useful claim is not "search returned fewer tokens" in isolation, but "this task received enough context with fewer full reads / fewer loops / acceptable privacy boundaries."
+This is a sketch, not a stable schema. If an agent harness or search tool has a better receipt shape, open an issue or discussion with a concrete example.

package/docs/portability-fidelity-report.md CHANGED Viewed

@@ -77,7 +77,7 @@ Pluribus is intentionally narrower than a skill registry or memory layer:
 - `pluribus.md` keeps the claim in one reviewed source of truth.
 - `sync --dry-run` previews target-specific outputs before writing files.
 - generated files carry a warning header so manual edits are visible.
-- `audit --json --fidelity-report` gives CI/reviewers a machine-readable check for missing/drifted outputs plus target-by-target section loss, activation shape, native discovery surface, resolution anchor, generic fallback status, load evidence, effective context scope, and portability warnings.
+- `audit --json --fidelity-report` gives CI/reviewers a machine-readable check for missing/drifted outputs plus target-by-target section loss, activation shape, native discovery surface, resolution anchor, generic fallback status, load evidence, duplicate-load selection evidence, effective context scope, and portability warnings.
 - remote imports are opt-in, locked, cached, and digest-checked before becoming shared context.
 That does **not** prove runtime behavior. You still need tool-specific smoke tests for load order, path/glob activation, available tools, MCP servers, and permission semantics.
@@ -91,6 +91,7 @@ For each selected target, the JSON report includes:
 - `genericFallback` — whether the output is a broad agent fallback surface rather than a target-specific native surface.
 - `manualActivationRequired` — whether Pluribus knows the output requires manual activation after generation. Built-in project-wide targets are currently `false`; future scoped/skill targets may differ.
 - `loadEvidence` — how the generated context is expected to enter the agent session. Built-in targets currently report `loadedBy` (`native-file-discovery` or `generic-agent-file`), `effectiveSource`, `deliveryMechanism`, `hookInstalled: false`, `injectedOnSessionStart: false`, `resumeBehavior: not-proven`, and `dedupeRisk: unknown`; this makes native-vs-hook-vs-manual duplication an explicit evidence question instead of an assumption.
+- `duplicateLoadEvidence` — which generated candidate Pluribus can identify for duplicate-load review. Built-in targets currently report a `contentIdentity` hash, one `candidateLoads[]` item, a matching `selectedLoad`, empty `suppressedLoads`, `crossRootScanMode: not-inspected`, and `duplicateRisk: unknown`; this is a receipt shape for Cursor/Claude-style duplicate skill or duplicate `CLAUDE.md` loads, not proof of runtime suppression.
 - `effectiveContext` — what Pluribus can prove about the context a target receives. Built-in targets currently report `scope: repo-root`, `pathScoped: false`, `inheritance: none-modeled`, `overrideBehavior: none-modeled`, plus the inferred `loadedBy` and `effectiveSource`; this is explicit evidence that monorepo path inheritance/isolation still needs a separate smoke.
 - `semanticDifference` — a compact list such as `section-loss`, `project-wide-only`, `no-path-scope-evidence`, `generic-agent-file`, or `runtime-load-dedupe-not-proven` so reviewers can distinguish “file exists” from “same behavior is preserved.”
@@ -103,7 +104,8 @@ These fields are intentionally boring. They help reviewers catch cases like “i
 3. Keep target-native instructions when a semantic cannot be represented everywhere.
 4. Commit a small audit artifact (`pluribus audit --json --fidelity-report --output reports/pluribus-audit.json`) when you want CI/review evidence.
 5. For hook/native/manual mixes, treat `loadEvidence.dedupeRisk: unknown` as a warning, not proof. Add a target-specific receipt for `loadedBy`, `hookInstalled`, `injectedOnSessionStart`, and resume behavior before claiming deduplication.
-6. For monorepos, treat `effectiveContext.scope: repo-root` as a warning, not proof. Add a target-specific smoke for the path you care about, for example `apps/client/` should load root + client context but not `apps/auth/` rules.
+6. For multi-root skill/rule scans, treat `duplicateLoadEvidence.duplicateRisk: unknown` as a warning, not proof. Add a target-specific receipt for `candidateLoads`, `selectedLoad`, `suppressedLoads`, discovery roots, and content hashes before claiming duplicate suppression.
+7. For monorepos, treat `effectiveContext.scope: repo-root` as a warning, not proof. Add a target-specific smoke for the path you care about, for example `apps/client/` should load root + client context but not `apps/auth/` rules.
 7. Update the claim whenever a new target is added, a tool changes capability names, a subdirectory context is introduced, a hook/manual injection path changes, or a permission/security default changes.
 ## Feedback wanted

package/examples/agent-skills/context-receipts/README.md ADDED Viewed

@@ -0,0 +1,22 @@
+# Context receipts Agent Skill recipe
+This is a small, copyable Agent Skill recipe for context-engineering users who are adopting Tool Search, lazy MCP loading, skills, memory, compaction, or subagents and need to verify what actually crossed the context boundary.
+It is intentionally markdown-only so it can be copied into a local skills directory such as:
+- `.claude/skills/context-receipts/SKILL.md`
+- `.opencode/skills/context-receipts/SKILL.md`
+- `.agents/skills/context-receipts/SKILL.md`
+## Quick smoke
+Ask an agent or harness using the skill to emit a receipt for one workflow and verify these constraints:
+```bash
+grep -E 'mcp\.tool_index\.loaded|context\.skill\.registry\.index\.loaded|subagent\.delegation\.requested' receipt.jsonl
+grep -E 'raw_(schema|query|args|result|output)_copied":false|raw.*CopiedToReceipt":false' receipt.jsonl
+```
+Then manually check that the receipt contains counts, hashes, ids, buckets, and `audit_gap`, but does **not** contain private prompts, raw schemas, tool args/results, skill bodies, memory bodies, customer names, secrets, or transcript text.
+For executable fixture examples, see [`../../context-input-evidence/`](../../context-input-evidence/).

package/examples/agent-skills/context-receipts/SKILL.md ADDED Viewed

@@ -0,0 +1,89 @@
+# Context Receipts
+Use this skill when an agent workflow claims to save context by selecting, deferring, hydrating, summarizing, compacting, delegating, or isolating context.
+The job is not to log the private content. The job is to emit a small receipt that lets a reviewer answer:
+> what crossed the context boundary, what stayed out, and what audit gap remains?
+## Privacy defaults
+Never include raw prompts, raw tool schemas, raw tool arguments, raw tool results, raw skill bodies, memory bodies, secrets, customer names, or full transcripts in the receipt.
+Prefer:
+- stable ids or hashed ids;
+- counts and token/line buckets;
+- categorical reasons;
+- explicit booleans for raw content copied/not copied;
+- before/after context budget buckets;
+- an `audit_gap` field when the receipt proves routing but not semantic correctness.
+## 60-second Tool Search smoke
+For MCP Tool Search, lazy tool loading, or progressive disclosure, emit enough evidence to answer these seven checks:
+1. **Index-only startup:** did the session load a compact tool/server index instead of all full schemas?
+2. **Search/routing:** what hashed query/category or routing reason selected candidate tools?
+3. **Hydration:** which full tool definition was loaded, why, and how many definitions stayed suppressed?
+4. **Call:** which server/tool id was invoked, with argument/result redaction status and success/error status?
+5. **Boundary:** if a manager subagent or child agent was used, did raw child output return to the parent?
+6. **Budget:** what were the startup and post-hydration context-token buckets?
+7. **Audit gap:** what is not proven, such as whether the selected tool was semantically optimal?
+Minimal JSONL event names:
+```jsonl
+{"event":"mcp.tool_index.loaded","loaded_server_count":12,"loaded_tool_index_count":84,"full_schema_count":0,"suppressed_tool_count":84,"raw_schema_copied":false,"startup_token_bucket":"lt_1k"}
+{"event":"mcp.tool_search.performed","query_hash":"sha256:...","query_category":"repo_search","candidate_tool_count":5,"selected_tool_id":"github.search_code","raw_query_copied":false}
+{"event":"mcp.tool_definition.loaded","tool_id":"github.search_code","hydrate_reason":"selected_after_tool_search","suppressed_tool_count":83,"definition_token_bucket":"1k_2k","raw_schema_copied":false}
+{"event":"mcp.tool_call.completed","tool_id":"github.search_code","args_hash":"sha256:...","result_token_bucket":"2k_4k","raw_args_copied":false,"raw_result_copied":false,"status":"ok"}
+```
+## Skill / prompt context smoke
+For skills, rules, AGENTS.md overlays, or instruction files, answer:
+- which index/listing entered the session;
+- which full skill/rule/instruction body was selected;
+- which candidates were suppressed and why;
+- whether the body was loaded at session start, after a search, or after an explicit command;
+- source hash, delivered hash, and canonical form when available;
+- whether the skill/instruction text was copied into the receipt.
+Minimal event names:
+- `context.skill.registry.index.loaded`
+- `context.skill.registry.skill.read`
+- `context.skill.registry.skill.injected`
+- `context.input.loaded`
+- `context.input.candidate_suppressed`
+## Subagent / manager boundary smoke
+For subagents, manager agents, or child workers, answer:
+- what task was delegated, by category and hashed objective;
+- what large output was captured by the child, as line/token buckets;
+- what bounded summary returned to the parent;
+- whether raw child output, tool results, or MCP schemas entered the parent context;
+- the remaining audit gap.
+Minimal event names:
+- `subagent.delegation.requested`
+- `subagent.tool_output.captured`
+- `subagent.summary.returned`
+- `parent.context_budget.evaluated`
+## Good receipt test
+A receipt is useful if a maintainer can debug one of these failures without seeing private content:
+- the agent never found the right tool/skill;
+- the full definition loaded too early;
+- too many definitions stayed in context;
+- a child/subagent saved no budget because raw output returned to the parent;
+- compaction happened but no one can prove what was preserved, summarized, or dropped.
+A receipt is not enough if it only says “Tool Search enabled” or “used subagent”. It must prove the boundary behavior.

package/examples/context-input-evidence/AGENTS.md ADDED Viewed

@@ -0,0 +1,12 @@
+---
+scope: repo
+applies_to: [claude-code, codex, cursor]
+why_loaded: shared invariant guidance
+expected_benefit: align agent behavior with repository conventions
+---
+# Agent instructions
+Work from {{repo_root}}.
+Prefer small patches, run the narrowest meaningful check, and explain any context file changes in review.

package/examples/context-input-evidence/agent-overlay-log.jsonl ADDED Viewed

@@ -0,0 +1,4 @@
+{"type":"session.start","time":"2026-05-21T19:00:00.000Z","session_id":"demo-session-agent-overlays","conversation_id":"demo-conversation-agent-overlays","agent":"cursor","workspace":"example/portable-agent-context"}
+{"type":"agent_context.loaded","time":"2026-05-21T19:00:01.000Z","source_path":"AGENTS.md","source_role":"base","target_agent":"all","load_order":1,"composition_policy":"base_then_agent_overlay","fallback_policy":"base_only_when_overlay_missing","activation":"native-file-discovery","why_loaded":"shared repository invariants","expected_benefit":"keep cross-agent behavior aligned while avoiding duplicated per-agent rules","source_text":"# Project rules\n- Prefer small reviewable changes.\n- Run tests before claiming success.\n- Never commit secrets.","delivered_text":"# Project rules\n- Prefer small reviewable changes.\n- Run tests before claiming success.\n- Never commit secrets."}
+{"type":"agent_context.loaded","time":"2026-05-21T19:00:02.000Z","source_path":"AGENTS.cursor.md","source_role":"agent_overlay","target_agent":"cursor","load_order":2,"composition_policy":"base_then_agent_overlay","fallback_policy":"base_only_when_overlay_missing","activation":"native-file-discovery","why_loaded":"Cursor-specific rule semantics and workspace hints","expected_benefit":"preserve agent-specific activation semantics without forking the shared repository guidance","source_text":"# Cursor overlay\n- Put Cursor-specific workspace rule hints here.\n- Do not duplicate base safety rules.","delivered_text":"# Cursor overlay\n- Put Cursor-specific workspace rule hints here.\n- Do not duplicate base safety rules."}
+{"type":"agent_context.loaded","time":"2026-05-21T19:00:03.000Z","source_path":"AGENTS.codex.md","source_role":"agent_overlay_candidate","target_agent":"codex","load_order":0,"composition_policy":"base_then_agent_overlay","fallback_policy":"base_only_when_overlay_missing","activation":"not_loaded_wrong_agent","why_loaded":"candidate discovered but not applicable to Cursor session","expected_benefit":"avoid mixing Codex-only instructions into Cursor context","source_text":"# Codex overlay\n- Codex-specific sandbox notes live here, not in the Cursor session.","delivered_text":""}