npm - pluribus-context - Versions diffs - 0.3.34 → 0.3.36 - Mend

pluribus-context 0.3.34 → 0.3.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (85) hide show

package/docs/review-primitive-gate.md ADDED Viewed

@@ -0,0 +1,109 @@
+# Review primitive gate for agent handoffs
+Use this when a parallel-agent run, Claude Code hook/workflow, Codex/OpenClaw handoff, or local control-plane wrapper needs to prove more than "the agent said it was done".
+The market question is not just what to log after a run. It is whether a reviewer or CI job can make a decision:
+- **continue** because the assignment stayed inside approved scope and required checks passed;
+- **review first** because the run is partial or has explicit unverified assumptions;
+- **reject / stop** because scope changed without approval, required checks were skipped or failed, or the run is unsafe to resume.
+Pluribus should not be the execution control plane. Worktrees, VMs, hooks, masks, and vendor guardrails can enforce parts of the run. The useful Pluribus layer is a small, privacy-safe receipt that turns those controls into reviewable evidence across tools.
+## Receipt shape
+Attach this receipt to a PR body, CI artifact, run summary, or handoff packet.
+```json
+{
+  "type": "agent.review_primitive_receipt.v1",
+  "assignment_id": "agent-auth-audit-42",
+  "run_id": "run-2026-05-31T17-00Z",
+  "agent": {
+    "tool": "claude-code",
+    "role": "auth-reviewer"
+  },
+  "approved_boundaries": {
+    "read": ["src/auth/**", "tests/auth/**"],
+    "write": ["tests/auth/**"],
+    "network": false
+  },
+  "scope_access_changes": [
+    {
+      "change": "read docs/security/**",
+      "reason": "needed policy wording for test fixture",
+      "approved": true,
+      "approved_by": "human-reviewer"
+    }
+  ],
+  "commands_and_checks": [
+    {
+      "name": "npm test -- tests/auth",
+      "kind": "required_test",
+      "status": "passed",
+      "evidence": "ci://job/123#auth-tests"
+    },
+    {
+      "name": "npm run lint",
+      "kind": "required_check",
+      "status": "passed",
+      "evidence": "ci://job/123#lint"
+    }
+  ],
+  "refused_operations": [
+    {
+      "operation": "write src/auth/session.ts",
+      "reason": "outside approved write boundary"
+    }
+  ],
+  "handoff": {
+    "changed_files_bucket": "under_5",
+    "evidence_path": "artifacts/agent-auth-audit-42.json",
+    "next_safe_action": "review tests/auth/session.test.ts before merge"
+  },
+  "resume_state": "complete",
+  "privacy": {
+    "raw_prompts_logged": false,
+    "raw_tool_output_logged": false,
+    "source_code_logged": false,
+    "secrets_logged": false
+  }
+}
+```
+## Minimal gate
+The copyable demo in [`examples/review-primitive-gate/`](../examples/review-primitive-gate/) turns the receipt into a CI/reviewer decision.
+If you use Claude Code hooks, the [`examples/claude-code-review-hook/`](../examples/claude-code-review-hook/) bridge shows how to run the same gate from `TaskCompleted`, `PostCompact`, or `SessionEnd` without logging raw prompts, transcripts, tool output, source code, or secrets.
+If you review AI-authored pull requests, the [`examples/ai-pr-review-receipts/`](../examples/ai-pr-review-receipts/) recipe shows the same gate as a GitHub Actions merge/check primitive for PR blast-radius evidence.
+```bash
+node examples/review-primitive-gate/check-review-receipt.mjs \
+  examples/review-primitive-gate/pass-review-receipt.json
+node examples/review-primitive-gate/check-review-receipt.mjs \
+  examples/review-primitive-gate/fail-review-receipt.json
+```
+The gate passes only when:
+- `type` is `agent.review_primitive_receipt.v1`;
+- `assignment_id` and `run_id` exist;
+- approved read/write boundaries are present;
+- every scope/access change is explicitly approved;
+- every required check/test passed;
+- `resume_state` is `complete`.
+The gate fails when a run is `partial` or `unsafe-to-resume`, when a required check is skipped/failed, or when scope changed without approval. That is intentional: partial work can be valuable, but it should not silently pass a merge gate.
+## What to keep out
+Do not put raw prompts, full transcripts, source code, exact proprietary paths, secrets, customer data, or raw tool output in the receipt. Use coarse globs, hashes, CI URLs, artifact IDs, pass/fail states, and human-readable next safe actions.
+## Why this is different from a receipt field list
+A field list says what happened. A review primitive says what the next system is allowed to do with that evidence.
+If the artifact cannot reject a PR, pause a handoff, or force review when the run became partial/unsafe, it is probably just a nicer `plan.md`.

package/docs/skill-install-receipts.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Skill install/load receipts
+Privacy-safe receipts for answering a narrow setup question:
+> After a skill installer ran, which agent targets can actually discover and load the installed skill, and what context budget did the install create?
+This is not a skill marketplace, package manager, or telemetry backend. Use this receipt next to tools such as `npx skills add`, team setup scripts, Claude Code plugins, Codex/Cursor/OpenClaw skill folders, or internal bootstrap scripts when the risky part is crossing several boundaries at once:
+1. the installer selected a source package/ref;
+2. files were written into project or global skill roots;
+3. each target agent discovered the installed manifest/resource;
+4. the runtime either injected/read the skill on activation or deferred/skipped it; and
+5. the new always-loaded or advertised context cost did not make the first session unsafe.
+The receipt should stay reviewable without raw skill bodies, private paths, prompts, transcripts, environment dumps, secrets, tool outputs, or customer data.
+## When to use it
+Use a skill install/load receipt when:
+- a setup command installs the same Skill into Claude Code, Codex, Cursor, OpenClaw, Zed/ACP, or another agent client;
+- a plugin/installer claims "cross-agent" support but you need proof per target;
+- a user says the Skill exists on disk but the runtime does not use it;
+- an installer adds MCP, hooks, rules, commands, or skill folders and could increase startup context cost; or
+- CI/review needs a compact proof that install succeeded without dumping the installed content.
+For mutation planning before any writes begin, use [install-plan receipts](install-plan-receipts.md). For runtime-only debugging where the file already exists but disappears in ACP/Zed/CLI/chat, use [loaded-resource boundary receipts](loaded-resource-boundary.md). This receipt sits between them: installer result + per-target discovery/load/budget proof.
+## Minimum contract
+```json
+{
+  "receipt_type": "agent.skill_install_receipt.v1",
+  "run_id": "skill-install-demo-001",
+  "installer": {
+    "name": "skills-cli",
+    "command_class": "skill_package_install",
+    "source": {
+      "kind": "git_ref",
+      "package": "vercel-labs/skills/context-budget-preflight",
+      "ref": "sha256:source-package-hash"
+    }
+  },
+  "mode_effective": "post_install_check",
+  "writes_completed": true,
+  "targets": [
+    {
+      "agent": "claude-code",
+      "scope": "project",
+      "required": true,
+      "install_status": "installed",
+      "discovery_status": "discovered",
+      "load_status": "activation_required",
+      "activation": "on_demand_skill_description",
+      "context_cost_bucket": "0-1k",
+      "safe_to_start_session": true
+    }
+  ],
+  "overall_safe_to_start_session": true,
+  "privacy_exclusions": ["raw_skill_body", "raw_prompt", "transcript", "secrets", "env_dump", "private_absolute_path"]
+}
+```
+## Fields that matter most
+- `installer.source` — package/ref/hash identity without embedded credentials.
+- `targets[].agent` and `targets[].scope` — which client and project/global root were targeted.
+- `targets[].install_status` — `installed`, `skipped`, or `failed`.
+- `targets[].discovery_status` — whether the target client can discover the installed manifest/resource.
+- `targets[].load_status` — `injected`, `readable`, `activation_required`, `deferred`, `not_tested`, or `failed`.
+- `targets[].context_cost_bucket` — coarse estimate such as `0-1k`, `1k-5k`, `5k-20k`, `over_budget`, or `unknown`; do not log raw schemas or skill text.
+- `targets[].safe_to_start_session` — false if a required target failed install/discovery, the runtime load test failed, or budget is already over cap.
+- `overall_safe_to_start_session` — false unless every required target is safe.
+## Copyable smoke test
+```bash
+node examples/skill-install-receipts/check-skill-install-receipt.mjs \
+  examples/skill-install-receipts/skill-install-receipt.json
+```
+Expected output:
+```text
+skill install receipt ok: 3 targets checked
+```
+## What this proves / does not prove
+Proves:
+- the installer wrote or skipped the intended targets;
+- each required target has an explicit discovery/load status;
+- context cost is bucketed before a session starts; and
+- raw skill/source content stayed out of the receipt.
+Does not prove:
+- the Skill is semantically good;
+- the agent will choose the Skill for every matching task;
+- the source package is trustworthy beyond the pinned ref/hash; or
+- runtime behavior in clients not listed in `targets`.

package/docs/skill-policy-receipts.md ADDED Viewed

@@ -0,0 +1,87 @@
+# Skill policy receipts
+Use this when an Agent Skill, `CLAUDE.md`, hook, or project rule says "do not touch X" but the agent can still drift into the forbidden path.
+The goal is not to log prompts or source code. The goal is a tiny, privacy-safe receipt that proves the run checked the policy boundary before writing code and again after writing code.
+This was prompted by a live `r/ClaudeCode` thread where a Skill told Claude Code not to create unit tests for internal services, but the run still generated one. Natural-language policy alone was too soft; the missing piece was an inspectable guard.
+## Boundary to prove
+For every requested change, capture:
+```json
+{
+  "receipt_type": "skill.policy.v1",
+  "skill": "unit-test-boundary",
+  "request_id": "local-run-2026-05-28T12:00Z",
+  "policy_scope": "unit-test targets",
+  "targets": [
+    {
+      "target": "src/public-api/client.test.ts",
+      "decision": "allowed",
+      "reason": "public API surface"
+    },
+    {
+      "target": "src/internal/billing/reconciler.test.ts",
+      "decision": "refused",
+      "reason": "internal service tests are out of scope for this Skill"
+    }
+  ],
+  "write_started": false,
+  "post_write_guard": "not_run",
+  "stopped_at": "policy_refused"
+}
+```
+Keep values coarse. Do not include code, secrets, customer names, stack traces, raw tool output, or full transcripts.
+## Minimal Skill guard
+Add a short preflight before the Skill writes files:
+```markdown
+## Policy preflight
+Before writing tests:
+1. List the intended test targets.
+2. Mark each target as `allowed` or `refused`.
+3. Refuse before writing if any target imports or exercises internal services.
+4. Emit a `skill.policy.v1` receipt with target names or coarse globs, decision, reason, and `write_started=false` when refused.
+5. Only after every target is allowed, write files.
+6. After writing, run the post-write guard and emit whether it passed.
+```
+Then add a post-write check that is simple enough for an agent to run reliably:
+```bash
+# Example: fail if generated unit tests import internal services.
+grep -R "from ['\"]\.\./\.\./internal\|from ['\"]@/internal\|require(['\"]@/internal" \
+  -- '*test.*' '*spec.*'
+```
+Adjust the grep for your repo. The important part is the receipt shape:
+- `policy_target_listed`
+- `policy_decision_allowed` / `policy_decision_refused`
+- `refusal_reason`
+- `write_started`
+- `post_write_guard_passed` / `post_write_guard_failed`
+- `stopped_at`
+## Why this belongs next to context receipts
+A Skill can be loaded and still fail to obey the boundary. That is the same class of problem as a healthy MCP server with tools invisible in the client, or a context file generated but not actually selected by the agent.
+The useful question is: **where did the boundary proof stop?**
+- Skill loaded, but no target list: policy was never made operational.
+- Target list exists, but no decisions: policy was considered but not enforced.
+- Refused target exists, but `write_started=true`: refusal came too late.
+- Post-write guard failed: generated code crossed the forbidden boundary.
+- Guard passed: the run has a small, reviewable receipt instead of only a confident claim.
+## Try the copyable Skill recipe
+See [`examples/agent-skills/skill-policy-receipts/`](../examples/agent-skills/skill-policy-receipts/) for a small `SKILL.md` recipe you can copy into Claude Code/OpenClaw-style Skill workflows.

package/docs/skill-use-rate-receipts.md ADDED Viewed

@@ -0,0 +1,104 @@
+# Skill use-rate receipts
+Agent Skill installers are getting good at the first boundary: download a Skill and attach it to Claude Code, Cursor, Codex, OpenCode, or another harness. The next boundary is harder: **installed is not used**.
+A skill use-rate receipt is a privacy-safe record that separates four states:
+1. **Discovered** — the installer/catalog found the Skill.
+2. **Installed/attached** — files or symlinks were written for a target agent/scope.
+3. **Invoked** — a session actually selected or loaded the Skill.
+4. **Useful enough to keep** — the Skill affected a reviewed action, check, or decision in a defined window.
+This matters when a team installs many Skills, plugins, commands, or subagents and later cannot tell which ones are just prompt clutter. The receipt should prove lifecycle state and usage counters without logging raw Skill bodies, prompts, source code, transcripts, or tool output.
+## Minimal receipt shape
+```json
+{
+  "schema": "pluribus.skill_use_rate_receipt.v1",
+  "run_id": "skills-audit-2026-06-05T13:00Z",
+  "generated_at": "2026-06-05T13:00:00Z",
+  "installer": {
+    "name": "skills",
+    "version": "1.5.9",
+    "command_digest": "sha256:..."
+  },
+  "window": {
+    "started_at": "2026-05-22T00:00:00Z",
+    "ended_at": "2026-06-05T13:00:00Z"
+  },
+  "skills": [
+    {
+      "skill_id": "frontend-design",
+      "source_ref": "github:vercel-labs/agent-skills/skills/frontend-design@main",
+      "target_agent": "claude-code",
+      "scope": "project",
+      "install_method": "symlink",
+      "discovered": true,
+      "installed": true,
+      "attached": true,
+      "invoked_count": 7,
+      "acted_on_count": 3,
+      "last_invoked_at": "2026-06-05T10:12:08Z",
+      "unused_since_install": false,
+      "context_cost_bucket": "small",
+      "evidence": [
+        {
+          "kind": "session_log_digest",
+          "ref": "sha256:0b7d..."
+        }
+      ]
+    }
+  ]
+}
+```
+## Evaluation questions
+Use this receipt to ask:
+- Which Skills are installed but never discovered by the harness?
+- Which Skills are discoverable but never invoked?
+- Which Skills are invoked but never acted on?
+- Which Skills are globally installed but only useful in one project?
+- Which Skills should be detached, narrowed, or promoted to a hard policy/check?
+## Privacy boundary
+Do record:
+- source refs and commit/tag when available;
+- target agent and scope;
+- install method (`copy`, `symlink`, generated file, ephemeral use);
+- boolean lifecycle states;
+- invocation and acted-on counters;
+- timestamps, hashes, and non-sensitive evidence refs.
+Do **not** record:
+- full Skill Markdown bodies;
+- raw user prompts or transcripts;
+- source code or tool output;
+- private file paths beyond a reviewed alias;
+- secrets, tokens, customer data, or unredacted environment values.
+## Copyable checker
+The [skill use-rate receipt example](../examples/skill-use-rate-receipts/) includes a small checker that validates required lifecycle fields and prints installed-but-unused Skills as review warnings rather than pretending installation equals adoption.
+```bash
+node examples/skill-use-rate-receipts/check-skill-use-rate.mjs \
+  examples/skill-use-rate-receipts/skill-use-rate-receipt.json
+```
+Expected output:
+```text
+skill use-rate receipt ok: 3 skills checked, 1 unused install warning
+```
+## Where this fits
+This is adjacent to [Skill install/load receipts](skill-install-receipts.md), but it answers a different question. Install/load receipts decide whether it is safe to start a session after an installer runs. Skill use-rate receipts decide whether a Skill actually earned its place after real sessions.
+The market signal behind this is current Skill/plugin consolidation pressure: teams can install many prompt resources, but the useful metric is not package count. It is `invoked / installed` and, when possible, `acted_on / invoked` over a reviewable window.

package/docs/subagent-role-receipts.md ADDED Viewed

@@ -0,0 +1,95 @@
+# Subagent role receipts
+Custom subagents are useful only if the caller can tell which role instructions actually governed the delegated work.
+Use this recipe when a project defines Codex/Claude Code/Cursor/OpenClaw-style subagents and wants a privacy-safe receipt for the role boundary: which role was requested, which instruction source was loaded, which tools/capabilities were allowed or deferred, and where the subagent stopped before crossing an unsafe boundary.
+This is not a claim that every agent runner uses the same file format. Treat `agents.toml` as a portable **example** for role definitions, and treat the receipt as the stable artifact: evidence about the role boundary without logging raw prompts, source code, transcripts, tool output, secrets, or customer data.
+## When this helps
+Use a subagent role receipt when:
+- a manager agent delegates work to a specialist reviewer, tester, security checker, migration planner, or docs writer;
+- the role has a narrower policy than the main agent;
+- the subagent has restricted tools, MCP servers, or write permissions;
+- the role should refuse mutation and only report findings;
+- a human reviewer needs to know which role instructions were loaded before trusting the result.
+## Example role definition
+The example in [`examples/subagent-role-receipts/agents.toml`](../examples/subagent-role-receipts/agents.toml) defines two project-local roles:
+- `blast-radius-reviewer` — reviews AI-generated PRs by operational boundaries before merge;
+- `temporal-authority-checker` — checks whether docs/specs are current or superseded before an agent writes code.
+The file is intentionally small so it can be adapted to the runner you use.
+## Receipt shape
+Attach this to a PR body, task handoff, review-bot comment, or run summary.
+```json
+{
+  "type": "subagent.role_boundary.v1",
+  "delegation": {
+    "requested_role": "blast-radius-reviewer",
+    "effective_role": "blast-radius-reviewer",
+    "role_source": "agents.toml",
+    "role_source_hash": "sha256:example-only",
+    "caller": "manager-agent"
+  },
+  "instructions": {
+    "loaded": true,
+    "source_kind": "project-local-role-definition",
+    "raw_instruction_logged": false,
+    "policy_summary": [
+      "review by blast radius, not diff size",
+      "do not approve merge when boundary evidence is ambiguous"
+    ]
+  },
+  "capabilities": {
+    "writes_allowed": false,
+    "tools_allowed": ["read", "grep", "test-summary"],
+    "tools_deferred_or_unavailable": ["shell-write", "deploy", "migration-apply"],
+    "mcp_servers_allowed": []
+  },
+  "boundary_decisions": [
+    {
+      "boundary": "schema_or_data_contract",
+      "status": "ambiguous",
+      "decision": "blocks_merge",
+      "reason": "migration rollback evidence missing"
+    }
+  ],
+  "handoff": {
+    "result_kind": "review_receipt",
+    "stopped_at": "ambiguous boundary before merge approval",
+    "next_safe_action": "ask backend owner to confirm rollback and reader compatibility"
+  },
+  "privacy": {
+    "raw_prompt_logged": false,
+    "raw_source_logged": false,
+    "raw_tool_output_logged": false,
+    "transcript_logged": false,
+    "secrets_logged": false,
+    "customer_data_logged": false
+  }
+}
+```
+## Minimal checklist
+Before trusting a delegated subagent result, ask for:
+- requested role and effective role match;
+- role definition source and coarse hash/version;
+- whether role instructions loaded through the intended path;
+- allowed/refused tool and write capabilities;
+- boundary decisions made by the role;
+- where the role stopped and the next safe action;
+- explicit privacy flags showing raw prompts/source/tool output were not logged.
+## What not to log
+Do not include raw prompts, full instructions, transcripts, source code, file paths that expose private structure, tool output, secrets, credentials, customer data, stack traces, or proprietary diffs. Prefer coarse names, hashes, counts, decision states, and review-owner labels.

package/docs/temporal-context-receipts.md ADDED Viewed

@@ -0,0 +1,123 @@
+# Temporal context receipts
+Use this when a long-lived AI coding project has old specs, ADRs, plans, or TODOs that still match grep but are no longer the current authority.
+The goal is not to delete history or log raw project content. The goal is a tiny, privacy-safe receipt that proves the agent separated **current authority** from **historical citation** before it edits code.
+This was prompted by a live `r/ClaudeCode` thread about the temporal problem in long-running projects: Claude Code can find every old plan, but grep is blind to time. If old docs do not carry status, date, and supersession metadata, the agent can treat a stale architecture note as current truth.
+## Boundary to prove
+For every coding run that reads design/context docs, capture a coarse receipt like this:
+```json
+{
+  "receipt_type": "context.temporal_authority.v1",
+  "request_id": "local-run-2026-05-28T16:00Z",
+  "current_authority": {
+    "file": "CURRENT_STATE.md",
+    "status": "current",
+    "as_of": "2026-05-28",
+    "scope": "checkout-flow"
+  },
+  "sources_considered": [
+    {
+      "file": "specs/2025-checkout-rewrite.md",
+      "status": "superseded",
+      "superseded_by": "CURRENT_STATE.md#checkout-flow",
+      "decision": "historical_citation_only"
+    },
+    {
+      "file": "specs/2026-checkout-risk-notes.md",
+      "status": "current",
+      "scope": "checkout-flow",
+      "decision": "allowed_as_supporting_context"
+    }
+  ],
+  "ambiguous_sources": [],
+  "write_started": true,
+  "stopped_at": "temporal_authority_resolved"
+}
+```
+Keep values coarse. Do not include source code, raw plans, prompts, transcripts, secrets, customer names, stack traces, private paths, or raw tool output.
+## Minimal doc convention
+Give every long-lived context file a small frontmatter header:
+```markdown
+---
+status: current # current | superseded | archived
+scope: checkout-flow
+date: 2026-05-28
+superseded_by: null
+---
+```
+For old specs:
+```markdown
+---
+status: superseded
+scope: checkout-flow
+date: 2025-11-10
+superseded_by: ../CURRENT_STATE.md#checkout-flow
+---
+```
+Then make `CURRENT_STATE.md` the short authority file an agent must read first:
+```markdown
+# Current state
+## checkout-flow
+- status: current
+- as_of: 2026-05-28
+- current authority: this section
+- related historical specs:
+  - specs/2025-checkout-rewrite.md (superseded)
+  - specs/2026-checkout-risk-notes.md (current supporting context)
+Agents may cite superseded specs for rationale, but must not implement from them unless the current authority explicitly reactivates that behavior.
+```
+## Agent preflight
+Before editing code in a long-lived project, ask the agent to do this:
+```markdown
+## Temporal authority preflight
+Before writing code:
+1. Read `CURRENT_STATE.md` or the repo's current-state equivalent.
+2. List design/spec/TODO/context files found for the requested scope.
+3. Mark each source as `current`, `superseded`, `archived`, or `ambiguous`.
+4. If any relevant source is `ambiguous` or lacks `superseded_by` while contradicting current authority, stop before writing.
+5. Emit a `context.temporal_authority.v1` receipt with coarse file names/globs, status, decision, `write_started`, and `stopped_at`.
+6. Only use superseded docs as historical citations, not as implementation authority.
+```
+Useful receipt markers:
+- `context_current_authority`
+- `historical_spec_citation`
+- `status_superseded`
+- `superseded_by_resolved`
+- `ambiguous_temporal_source`
+- `stale_source_ignored`
+- `write_refused_until_authority_resolved`
+- `preflight_temporal_decision`
+## Where this catches failures
+- Old spec matches grep, but has `status: superseded`: agent can cite it but should not implement from it.
+- Old spec conflicts with `CURRENT_STATE.md` and has no `superseded_by`: agent should stop and ask for authority resolution.
+- Multiple current files claim the same scope: agent should stop before writing.
+- Current authority exists, but the run never read it: the receipt should show `stopped_at=current_authority_missing` or `write_started=false`.
+## Try the copyable example
+See [`examples/temporal-context-receipts/`](../examples/temporal-context-receipts/) for a minimal `CURRENT_STATE.md`, superseded spec, current supporting note, and receipt example.

package/examples/agent-firewall-denial-audit/README.md ADDED Viewed

@@ -0,0 +1,14 @@
+# Agent firewall denial/audit example
+This example turns an agent-firewall block into two privacy-safe artifacts:
+- `denial-envelope.json` — what the model is allowed to see.
+- `operator-audit-record.json` — what the operator/dashboard/CI can audit.
+Run the checker:
+```bash
+node check-denial-audit.mjs .
+```
+The checker enforces the core invariant: blocks should be structured enough for the model to stop or ask for approval, but not detailed enough to reveal raw commands, secrets, private paths, or bypassable policy internals.