npm - pluribus-context - Versions diffs - 0.3.22 → 0.3.26 - Mend

pluribus-context 0.3.22 → 0.3.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (93) hide show

package/docs/orchestration-search-receipts.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Orchestration-layer search receipts
+Search tools can report strong local savings while the whole agent task still gets more expensive: extra turns, repeated queries, insufficient snippets, or fallback full-file reads can erase the gain.
+The receipt should live at the **orchestration / harness layer**, not inside a retrieval tool. The harness can see the task boundary, tool calls, follow-up reads, and whether the agent looped. A retrieval tool should not need to inspect the user's full task transcript or other tool traffic just to prove its value.
+## Audience
+Use this when evaluating code-search, MCP retrieval, RAG-over-notes, or agent memory tools inside Claude Code, Cursor, Codex, Copilot, Windsurf, OpenClaw, Continue, or custom harnesses.
+## What to prove
+A useful search receipt should answer:
+- which task or session this evidence belongs to;
+- which search/retrieval calls happened;
+- how much context was returned to the agent;
+- which files or notes were pointed to;
+- whether the agent needed follow-up full reads;
+- whether insufficient snippets caused repeated queries or extra loops;
+- what data was intentionally omitted for privacy.
+## Minimal JSON shape
+```json
+{
+  "schema": "pluribus.orchestrationSearchReceipt.v0",
+  "taskId": "optional-local-task-id",
+  "createdAt": "2026-05-20T15:00:00Z",
+  "orchestrator": {
+    "name": "local-harness",
+    "version": "0.1.0"
+  },
+  "privacy": {
+    "storesPrompts": false,
+    "storesResultText": false,
+    "redaction": "paths-and-counts-only by default"
+  },
+  "retrievalTool": {
+    "name": "example-search-tool",
+    "mode": "mcp-or-cli"
+  },
+  "queries": [
+    {
+      "queryId": "q1",
+      "queryHash": "sha256:...",
+      "mode": "hybrid",
+      "topK": 5,
+      "returnedChunks": 5,
+      "returnedChars": 4479,
+      "estimatedTokensReturned": 1120,
+      "resultFiles": ["src/auth.ts", "src/session.ts"],
+      "candidateFileChars": 85000,
+      "followUpFullReads": ["src/auth.ts"],
+      "fallbackRecommended": false,
+      "resultSufficiency": "partial"
+    }
+  ],
+  "totals": {
+    "searchCalls": 3,
+    "returnedChars": 12000,
+    "estimatedTokensReturned": 3000,
+    "estimatedTokensAvoidedVsFullFiles": 42000,
+    "followUpFullReadCount": 1,
+    "repeatedQueryCount": 1
+  },
+  "limits": [
+    "Does not prove end-to-end task success by itself.",
+    "Pair with a transcript or eval harness when comparing agent-level quality/cost."
+  ]
+}
+```
+## Privacy defaults
+Default to counts, hashes, paths, and tool-call metadata. Do **not** store prompts, raw result text, repository secrets, memory contents, or full transcripts unless the user explicitly opts in.
+Good defaults:
+- hash query text instead of storing it when the query may contain private details;
+- keep paths relative and redact ignored/private directories;
+- store byte/token counts instead of snippet bodies;
+- store fallback/full-read counts, not file contents;
+- make receipt output a local artifact unless the user chooses to publish it.
+## Why this belongs above the search tool
+A search tool can usually know `returnedChars`, `topK`, and result paths. It usually cannot know whether the agent then:
+- asked the same question again;
+- read entire files anyway;
+- solved the task;
+- paid more in turns than it saved in retrieval tokens;
+- mixed results with other memory/context sources.
+That evidence is visible to the orchestrator, agent harness, CLI wrapper, CI eval, or transcript analyzer. Keeping the receipt there avoids asking retrieval tools to become invasive observers of the whole workflow.
+## How this relates to Pluribus
+Pluribus already emits context-file fidelity/load evidence for upfront context files. This document extends the same principle to **retrieved context**: the useful claim is not "search returned fewer tokens" in isolation, but "this task received enough context with fewer full reads / fewer loops / acceptable privacy boundaries."
+This is a sketch, not a stable schema. If an agent harness or search tool has a better receipt shape, open an issue or discussion with a concrete example.

package/docs/portability-fidelity-report.md CHANGED Viewed

@@ -77,7 +77,7 @@ Pluribus is intentionally narrower than a skill registry or memory layer:
 - `pluribus.md` keeps the claim in one reviewed source of truth.
 - `sync --dry-run` previews target-specific outputs before writing files.
 - generated files carry a warning header so manual edits are visible.
-- `audit --json --fidelity-report` gives CI/reviewers a machine-readable check for missing/drifted outputs plus target-by-target section loss, activation shape, native discovery surface, resolution anchor, generic fallback status, load evidence, effective context scope, and portability warnings.
+- `audit --json --fidelity-report` gives CI/reviewers a machine-readable check for missing/drifted outputs plus target-by-target section loss, activation shape, native discovery surface, resolution anchor, generic fallback status, load evidence, duplicate-load selection evidence, effective context scope, and portability warnings.
 - remote imports are opt-in, locked, cached, and digest-checked before becoming shared context.
 That does **not** prove runtime behavior. You still need tool-specific smoke tests for load order, path/glob activation, available tools, MCP servers, and permission semantics.
@@ -91,6 +91,7 @@ For each selected target, the JSON report includes:
 - `genericFallback` — whether the output is a broad agent fallback surface rather than a target-specific native surface.
 - `manualActivationRequired` — whether Pluribus knows the output requires manual activation after generation. Built-in project-wide targets are currently `false`; future scoped/skill targets may differ.
 - `loadEvidence` — how the generated context is expected to enter the agent session. Built-in targets currently report `loadedBy` (`native-file-discovery` or `generic-agent-file`), `effectiveSource`, `deliveryMechanism`, `hookInstalled: false`, `injectedOnSessionStart: false`, `resumeBehavior: not-proven`, and `dedupeRisk: unknown`; this makes native-vs-hook-vs-manual duplication an explicit evidence question instead of an assumption.
+- `duplicateLoadEvidence` — which generated candidate Pluribus can identify for duplicate-load review. Built-in targets currently report a `contentIdentity` hash, one `candidateLoads[]` item, a matching `selectedLoad`, empty `suppressedLoads`, `crossRootScanMode: not-inspected`, and `duplicateRisk: unknown`; this is a receipt shape for Cursor/Claude-style duplicate skill or duplicate `CLAUDE.md` loads, not proof of runtime suppression.
 - `effectiveContext` — what Pluribus can prove about the context a target receives. Built-in targets currently report `scope: repo-root`, `pathScoped: false`, `inheritance: none-modeled`, `overrideBehavior: none-modeled`, plus the inferred `loadedBy` and `effectiveSource`; this is explicit evidence that monorepo path inheritance/isolation still needs a separate smoke.
 - `semanticDifference` — a compact list such as `section-loss`, `project-wide-only`, `no-path-scope-evidence`, `generic-agent-file`, or `runtime-load-dedupe-not-proven` so reviewers can distinguish “file exists” from “same behavior is preserved.”
@@ -103,7 +104,8 @@ These fields are intentionally boring. They help reviewers catch cases like “i
 3. Keep target-native instructions when a semantic cannot be represented everywhere.
 4. Commit a small audit artifact (`pluribus audit --json --fidelity-report --output reports/pluribus-audit.json`) when you want CI/review evidence.
 5. For hook/native/manual mixes, treat `loadEvidence.dedupeRisk: unknown` as a warning, not proof. Add a target-specific receipt for `loadedBy`, `hookInstalled`, `injectedOnSessionStart`, and resume behavior before claiming deduplication.
-6. For monorepos, treat `effectiveContext.scope: repo-root` as a warning, not proof. Add a target-specific smoke for the path you care about, for example `apps/client/` should load root + client context but not `apps/auth/` rules.
+6. For multi-root skill/rule scans, treat `duplicateLoadEvidence.duplicateRisk: unknown` as a warning, not proof. Add a target-specific receipt for `candidateLoads`, `selectedLoad`, `suppressedLoads`, discovery roots, and content hashes before claiming duplicate suppression.
+7. For monorepos, treat `effectiveContext.scope: repo-root` as a warning, not proof. Add a target-specific smoke for the path you care about, for example `apps/client/` should load root + client context but not `apps/auth/` rules.
 7. Update the claim whenever a new target is added, a tool changes capability names, a subdirectory context is introduced, a hook/manual injection path changes, or a permission/security default changes.
 ## Feedback wanted

package/examples/context-input-evidence/AGENTS.md ADDED Viewed

@@ -0,0 +1,12 @@
+---
+scope: repo
+applies_to: [claude-code, codex, cursor]
+why_loaded: shared invariant guidance
+expected_benefit: align agent behavior with repository conventions
+---
+# Agent instructions
+Work from {{repo_root}}.
+Prefer small patches, run the narrowest meaningful check, and explain any context file changes in review.

package/examples/context-input-evidence/agent-overlay-log.jsonl ADDED Viewed

@@ -0,0 +1,4 @@
+{"type":"session.start","time":"2026-05-21T19:00:00.000Z","session_id":"demo-session-agent-overlays","conversation_id":"demo-conversation-agent-overlays","agent":"cursor","workspace":"example/portable-agent-context"}
+{"type":"agent_context.loaded","time":"2026-05-21T19:00:01.000Z","source_path":"AGENTS.md","source_role":"base","target_agent":"all","load_order":1,"composition_policy":"base_then_agent_overlay","fallback_policy":"base_only_when_overlay_missing","activation":"native-file-discovery","why_loaded":"shared repository invariants","expected_benefit":"keep cross-agent behavior aligned while avoiding duplicated per-agent rules","source_text":"# Project rules\n- Prefer small reviewable changes.\n- Run tests before claiming success.\n- Never commit secrets.","delivered_text":"# Project rules\n- Prefer small reviewable changes.\n- Run tests before claiming success.\n- Never commit secrets."}
+{"type":"agent_context.loaded","time":"2026-05-21T19:00:02.000Z","source_path":"AGENTS.cursor.md","source_role":"agent_overlay","target_agent":"cursor","load_order":2,"composition_policy":"base_then_agent_overlay","fallback_policy":"base_only_when_overlay_missing","activation":"native-file-discovery","why_loaded":"Cursor-specific rule semantics and workspace hints","expected_benefit":"preserve agent-specific activation semantics without forking the shared repository guidance","source_text":"# Cursor overlay\n- Put Cursor-specific workspace rule hints here.\n- Do not duplicate base safety rules.","delivered_text":"# Cursor overlay\n- Put Cursor-specific workspace rule hints here.\n- Do not duplicate base safety rules."}
+{"type":"agent_context.loaded","time":"2026-05-21T19:00:03.000Z","source_path":"AGENTS.codex.md","source_role":"agent_overlay_candidate","target_agent":"codex","load_order":0,"composition_policy":"base_then_agent_overlay","fallback_policy":"base_only_when_overlay_missing","activation":"not_loaded_wrong_agent","why_loaded":"candidate discovered but not applicable to Cursor session","expected_benefit":"avoid mixing Codex-only instructions into Cursor context","source_text":"# Codex overlay\n- Codex-specific sandbox notes live here, not in the Cursor session.","delivered_text":""}