pluribus-context 0.3.34 → 0.3.36
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +23 -0
- package/README.md +2 -1
- package/bin/pluribus.js +12 -0
- package/docs/agent-firewall-denial-audit.md +95 -0
- package/docs/ai-pr-review-receipts.md +173 -0
- package/docs/canonical-output-receipts.md +107 -0
- package/docs/compaction-resume-receipts.md +43 -0
- package/docs/controlled-learning-queue.md +48 -0
- package/docs/dynamic-workflow-run-receipts.md +158 -0
- package/docs/install-plan-receipts.md +79 -0
- package/docs/loaded-resource-boundary.md +97 -0
- package/docs/mcp-tool-visibility-receipts.md +67 -0
- package/docs/memory-write-policy-receipts.md +41 -0
- package/docs/parallel-session-review-ledger.md +103 -0
- package/docs/phase-boundary-contracts.md +87 -0
- package/docs/review-primitive-gate.md +109 -0
- package/docs/skill-install-receipts.md +102 -0
- package/docs/skill-policy-receipts.md +87 -0
- package/docs/skill-use-rate-receipts.md +104 -0
- package/docs/subagent-role-receipts.md +95 -0
- package/docs/temporal-context-receipts.md +123 -0
- package/examples/agent-firewall-denial-audit/README.md +14 -0
- package/examples/agent-firewall-denial-audit/check-denial-audit.mjs +116 -0
- package/examples/agent-firewall-denial-audit/denial-envelope.json +9 -0
- package/examples/agent-firewall-denial-audit/operator-audit-record.json +20 -0
- package/examples/agent-skills/skill-policy-receipts/README.md +22 -0
- package/examples/agent-skills/skill-policy-receipts/SKILL.md +77 -0
- package/examples/ai-pr-review-receipts/.github/pull_request_template.md +31 -0
- package/examples/ai-pr-review-receipts/.github/workflows/ai-pr-review-receipt.yml +25 -0
- package/examples/ai-pr-review-receipts/README.md +55 -0
- package/examples/ai-pr-review-receipts/incomplete-review-primitive-receipt.json +43 -0
- package/examples/ai-pr-review-receipts/review-primitive-receipt.json +60 -0
- package/examples/canonical-output-receipts/canonical-output-receipt.json +55 -0
- package/examples/claude-code-review-hook/README.md +74 -0
- package/examples/claude-code-review-hook/check-review-receipt-hook.mjs +80 -0
- package/examples/claude-code-review-hook/sample-task-completed-event.json +6 -0
- package/examples/compaction-resume-receipts/README.md +12 -0
- package/examples/compaction-resume-receipts/check-resume-receipt.mjs +116 -0
- package/examples/compaction-resume-receipts/safe-resume-receipt.json +52 -0
- package/examples/compaction-resume-receipts/unsafe-resume-receipt.json +41 -0
- package/examples/controlled-learning-queue/README.md +26 -0
- package/examples/controlled-learning-queue/check-learning-queue.mjs +44 -0
- package/examples/controlled-learning-queue/leads/acme-job-card.md +12 -0
- package/examples/controlled-learning-queue/learning_queue.md +27 -0
- package/examples/controlled-learning-queue/memory/durable.md +10 -0
- package/examples/controlled-learning-queue/memory/working-notes.md +5 -0
- package/examples/controlled-learning-queue/role/job-contract.md +18 -0
- package/examples/controlled-learning-queue/skills/qualify-lead.md +17 -0
- package/examples/dynamic-workflow-run-receipts/README.md +18 -0
- package/examples/dynamic-workflow-run-receipts/workflow-run-receipt.json +112 -0
- package/examples/install-plan-receipts/README.md +34 -0
- package/examples/install-plan-receipts/agent-install-plan-receipt.json +56 -0
- package/examples/loaded-resource-boundary/README.md +22 -0
- package/examples/loaded-resource-boundary/check-loaded-resource-boundary.mjs +65 -0
- package/examples/loaded-resource-boundary/loaded-resource-boundary.json +69 -0
- package/examples/memory-write-policy/README.md +28 -0
- package/examples/memory-write-policy/approved-memory-update.json +48 -0
- package/examples/memory-write-policy/check-memory-update.mjs +120 -0
- package/examples/memory-write-policy/quarantined-memory-update.json +43 -0
- package/examples/parallel-session-review-ledger/README.md +13 -0
- package/examples/parallel-session-review-ledger/check-parallel-session-review-ledger.mjs +69 -0
- package/examples/parallel-session-review-ledger/parallel-session-review-ledger.json +72 -0
- package/examples/phase-boundary-contract/README.md +23 -0
- package/examples/phase-boundary-contract/check-phase-boundary.mjs +73 -0
- package/examples/phase-boundary-contract/phase-boundary-contract.json +68 -0
- package/examples/review-primitive-gate/README.md +19 -0
- package/examples/review-primitive-gate/check-review-receipt.mjs +100 -0
- package/examples/review-primitive-gate/fail-review-receipt.json +42 -0
- package/examples/review-primitive-gate/pass-review-receipt.json +54 -0
- package/examples/skill-install-receipts/README.md +31 -0
- package/examples/skill-install-receipts/check-skill-install-receipt.mjs +75 -0
- package/examples/skill-install-receipts/skill-install-receipt.json +79 -0
- package/examples/skill-use-rate-receipts/README.md +16 -0
- package/examples/skill-use-rate-receipts/check-skill-use-rate.mjs +89 -0
- package/examples/skill-use-rate-receipts/skill-use-rate-receipt.json +79 -0
- package/examples/subagent-role-receipts/README.md +15 -0
- package/examples/subagent-role-receipts/agents.toml +36 -0
- package/examples/temporal-context-receipts/CURRENT_STATE.md +13 -0
- package/examples/temporal-context-receipts/specs/2025-checkout-rewrite.md +10 -0
- package/examples/temporal-context-receipts/specs/2026-checkout-risk-notes.md +10 -0
- package/examples/temporal-context-receipts/temporal-authority-receipt.json +27 -0
- package/package.json +1 -1
- package/src/commands/demo.js +155 -0
- package/src/index.js +1 -0
- package/src/utils/version.js +1 -1
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# Dynamic workflow run receipts
|
|
2
|
+
|
|
3
|
+
Claude Code-style dynamic workflows move orchestration into a script that can spawn many subagents, keep intermediate results outside the parent conversation, and show progress by phase, agent count, token total, and elapsed time.
|
|
4
|
+
|
|
5
|
+
That is useful when a codebase audit, migration, research task, or verification pass needs more parallelism than one conversation can coordinate. It also creates a new failure mode: one child agent can loop, burn tokens, or drift while the parent workflow only shows a high-level progress line.
|
|
6
|
+
|
|
7
|
+
Use a dynamic workflow run receipt when a workflow, ultracode run, local LLM gateway, or multi-agent script delegates work across several agents/models and a human needs a privacy-safe summary of what actually happened.
|
|
8
|
+
|
|
9
|
+
The first thing to look for is a **per-agent fuse**: budget, heartbeat, partial progress, stop reason, and kill-switch state for every spawned agent. After that, inspect whether the expensive path bought better verification or just more context drift.
|
|
10
|
+
|
|
11
|
+
This is not an orchestration framework. The receipt is the stable artifact: compact evidence for each phase and spawned agent without logging raw prompts, source code, transcripts, tool output, secrets, customer data, or proprietary file paths.
|
|
12
|
+
|
|
13
|
+
## When this helps
|
|
14
|
+
|
|
15
|
+
Use this receipt when:
|
|
16
|
+
|
|
17
|
+
- a workflow spawns several agents to audit, migrate, research, or verify a codebase;
|
|
18
|
+
- agents may run different roles, models, or local/remote providers;
|
|
19
|
+
- the run has a token/cost budget that needs to be explained after the fact;
|
|
20
|
+
- a child agent could loop, stall, or keep spending while the parent workflow stays mostly blind;
|
|
21
|
+
- the parent session sees only the final report, not every intermediate result;
|
|
22
|
+
- a reviewer needs to know what context was loaded, skipped, or suppressed for each agent;
|
|
23
|
+
- the run stops, pauses, resumes, or rejects a result and the stop point matters.
|
|
24
|
+
|
|
25
|
+
## Receipt shape
|
|
26
|
+
|
|
27
|
+
Attach this to a workflow report, PR body, task handoff, run summary, or CI artifact.
|
|
28
|
+
|
|
29
|
+
```json
|
|
30
|
+
{
|
|
31
|
+
"type": "dynamic.workflow.run_receipt.v1",
|
|
32
|
+
"workflow": {
|
|
33
|
+
"workflow_id": "wf_checkout_auth_audit_2026_05_30",
|
|
34
|
+
"runner": "claude-code-dynamic-workflow",
|
|
35
|
+
"script_source": "generated-then-reviewed-command",
|
|
36
|
+
"script_hash": "sha256:example-only",
|
|
37
|
+
"task_kind": "codebase_auth_audit",
|
|
38
|
+
"plan_approved_before_run": true,
|
|
39
|
+
"resumable": true,
|
|
40
|
+
"max_wall_clock_bucket": "under_15m",
|
|
41
|
+
"kill_switch_available": true,
|
|
42
|
+
"started_at": "2026-05-30T15:20:00Z",
|
|
43
|
+
"completed_at": "2026-05-30T15:31:42Z"
|
|
44
|
+
},
|
|
45
|
+
"permissions": {
|
|
46
|
+
"tool_allowlist_inherited": true,
|
|
47
|
+
"writes_allowed": false,
|
|
48
|
+
"network_allowed": false,
|
|
49
|
+
"external_commands_allowed": ["grep", "test --dry-run"],
|
|
50
|
+
"permission_profile": "review-only"
|
|
51
|
+
},
|
|
52
|
+
"phases": [
|
|
53
|
+
{
|
|
54
|
+
"phase_id": "route-inventory",
|
|
55
|
+
"purpose": "find candidate auth-sensitive routes",
|
|
56
|
+
"agent_count": 3,
|
|
57
|
+
"token_spend_bucket": "under_50k",
|
|
58
|
+
"elapsed_ms_bucket": "under_2m",
|
|
59
|
+
"result": "completed"
|
|
60
|
+
},
|
|
61
|
+
{
|
|
62
|
+
"phase_id": "adversarial-review",
|
|
63
|
+
"purpose": "cross-check candidate misses",
|
|
64
|
+
"agent_count": 2,
|
|
65
|
+
"token_spend_bucket": "under_25k",
|
|
66
|
+
"elapsed_ms_bucket": "under_2m",
|
|
67
|
+
"result": "completed_with_gaps"
|
|
68
|
+
}
|
|
69
|
+
],
|
|
70
|
+
"agents": [
|
|
71
|
+
{
|
|
72
|
+
"agent_id": "agent-route-auditor-1",
|
|
73
|
+
"phase_id": "route-inventory",
|
|
74
|
+
"role": "route-auth-auditor",
|
|
75
|
+
"model": "claude-sonnet",
|
|
76
|
+
"provider": "anthropic",
|
|
77
|
+
"context_loaded": ["repo-policy", "auth-boundary-rules", "route-index-summary"],
|
|
78
|
+
"context_skipped_or_suppressed": [
|
|
79
|
+
{
|
|
80
|
+
"source": "customer-fixture-dump",
|
|
81
|
+
"reason": "contains raw customer data; summary hash only"
|
|
82
|
+
}
|
|
83
|
+
],
|
|
84
|
+
"tools_granted": ["read", "grep"],
|
|
85
|
+
"tools_used": ["grep"],
|
|
86
|
+
"feature_areas_checked": ["checkout routes", "admin routes"],
|
|
87
|
+
"token_budget_bucket": "under_25k",
|
|
88
|
+
"token_spend_bucket": "under_10k",
|
|
89
|
+
"max_iterations": 8,
|
|
90
|
+
"iterations_used": 3,
|
|
91
|
+
"heartbeat_seen_at": "2026-05-30T15:25:00Z",
|
|
92
|
+
"partial_progress_reported": true,
|
|
93
|
+
"fuse_triggered": false,
|
|
94
|
+
"stop_reason": "completed_assigned_partition",
|
|
95
|
+
"confidence": "medium",
|
|
96
|
+
"known_gaps": ["did not execute integration tests"],
|
|
97
|
+
"raw_prompt_logged": false,
|
|
98
|
+
"raw_tool_output_logged": false,
|
|
99
|
+
"raw_paths_logged": false
|
|
100
|
+
},
|
|
101
|
+
{
|
|
102
|
+
"agent_id": "agent-reviewer-1",
|
|
103
|
+
"phase_id": "adversarial-review",
|
|
104
|
+
"role": "adversarial-auth-reviewer",
|
|
105
|
+
"model": "local-codex-compatible",
|
|
106
|
+
"provider": "local-llm-gateway",
|
|
107
|
+
"context_loaded": ["candidate-findings-summary", "public-api-contract-summary"],
|
|
108
|
+
"context_skipped_or_suppressed": [],
|
|
109
|
+
"tools_granted": ["read"],
|
|
110
|
+
"tools_used": ["read"],
|
|
111
|
+
"feature_areas_checked": ["route findings cross-check"],
|
|
112
|
+
"token_budget_bucket": "under_10k",
|
|
113
|
+
"token_spend_bucket": "under_10k",
|
|
114
|
+
"max_iterations": 5,
|
|
115
|
+
"iterations_used": 5,
|
|
116
|
+
"heartbeat_seen_at": "2026-05-30T15:30:00Z",
|
|
117
|
+
"partial_progress_reported": true,
|
|
118
|
+
"fuse_triggered": true,
|
|
119
|
+
"stop_reason": "iteration_budget_reached_before_claim_verified",
|
|
120
|
+
"confidence": "low",
|
|
121
|
+
"known_gaps": ["one route requires owner confirmation before merge"],
|
|
122
|
+
"raw_prompt_logged": false,
|
|
123
|
+
"raw_tool_output_logged": false,
|
|
124
|
+
"raw_paths_logged": false
|
|
125
|
+
}
|
|
126
|
+
],
|
|
127
|
+
"handoff": {
|
|
128
|
+
"final_result_kind": "workflow_review_receipt",
|
|
129
|
+
"claims_rejected_or_deferred": 1,
|
|
130
|
+
"next_safe_action": "ask route owner to confirm checkout callback auth before writing fix",
|
|
131
|
+
"where_it_stopped": "ambiguous auth boundary before mutation"
|
|
132
|
+
},
|
|
133
|
+
"privacy": {
|
|
134
|
+
"raw_prompts_logged": false,
|
|
135
|
+
"raw_source_logged": false,
|
|
136
|
+
"raw_tool_output_logged": false,
|
|
137
|
+
"transcripts_logged": false,
|
|
138
|
+
"secrets_logged": false,
|
|
139
|
+
"customer_data_logged": false
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
## Minimal checklist
|
|
145
|
+
|
|
146
|
+
Before trusting the result of a dynamic workflow, ask for:
|
|
147
|
+
|
|
148
|
+
- workflow/run id, runner, script source, script hash, and whether the plan was approved before execution;
|
|
149
|
+
- workflow-level wall-clock budget, whether a kill switch exists, and whether the run can be paused/resumed safely;
|
|
150
|
+
- permission profile, inherited tool allowlist, write/network/command capability, and whether the run was review-only or mutating;
|
|
151
|
+
- phases, agent counts, token spend buckets, elapsed-time buckets, and phase result states;
|
|
152
|
+
- per-agent role, model/provider actually used, context loaded, context skipped/suppressed, tools granted/used, token budget/spend, iteration budget, heartbeat, partial progress, fuse state, stop reason, confidence, and known gaps;
|
|
153
|
+
- explicit privacy flags proving raw prompts, source, transcripts, tool output, paths, secrets, and customer data were not logged;
|
|
154
|
+
- a handoff that says what was accepted, rejected/deferred, where the workflow stopped, and the next safe action.
|
|
155
|
+
|
|
156
|
+
## What not to log
|
|
157
|
+
|
|
158
|
+
Do not include raw prompts, full workflow scripts when they reveal private structure, full transcripts, source code, exact proprietary paths, tool output, secrets, credentials, customer data, stack traces, or raw LLM gateway logs. Prefer coarse names, hashes, buckets, counts, role labels, decision states, stop reasons, and owner labels.
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# Install-plan receipts
|
|
2
|
+
|
|
3
|
+
Use this when an MCP server, Skill bundle, plugin, starter kit, or setup script says it can configure many AI coding tools for you.
|
|
4
|
+
|
|
5
|
+
The risk is not only whether a hook later runs safely. The earlier boundary is the installer itself: it may detect agents, write MCP config, add instruction files, install Skills, register hooks, or create backups before the user understands what changed.
|
|
6
|
+
|
|
7
|
+
The goal is a tiny, privacy-safe pre-mutation receipt that proves what the setup step intends to touch **before the first write starts**. Do not log prompts, source code, secrets, raw environment dumps, transcripts, raw command output, customer data, or private absolute paths.
|
|
8
|
+
|
|
9
|
+
## Boundary to prove
|
|
10
|
+
|
|
11
|
+
For every setup/install run, capture a plan like this before applying changes:
|
|
12
|
+
|
|
13
|
+
```json
|
|
14
|
+
{
|
|
15
|
+
"receipt_type": "agent.install.plan.v1",
|
|
16
|
+
"run_id": "local-install-2026-05-29T16:00Z",
|
|
17
|
+
"installer": "code-memory-mcp",
|
|
18
|
+
"mode_requested": "plan",
|
|
19
|
+
"mode_effective": "plan",
|
|
20
|
+
"agents_detected": ["claude-code", "cursor", "codex", "openclaw"],
|
|
21
|
+
"agents_selected": ["claude-code", "openclaw"],
|
|
22
|
+
"planned_writes": [
|
|
23
|
+
{
|
|
24
|
+
"kind": "mcp_config",
|
|
25
|
+
"target": "claude-code project config",
|
|
26
|
+
"operation": "add_server",
|
|
27
|
+
"backup_planned": true
|
|
28
|
+
},
|
|
29
|
+
{
|
|
30
|
+
"kind": "instruction_file",
|
|
31
|
+
"target": "AGENTS.md",
|
|
32
|
+
"operation": "append_usage_notes",
|
|
33
|
+
"backup_planned": true
|
|
34
|
+
},
|
|
35
|
+
{
|
|
36
|
+
"kind": "hook",
|
|
37
|
+
"target": "pre-tool hook config",
|
|
38
|
+
"operation": "register_command",
|
|
39
|
+
"backup_planned": true
|
|
40
|
+
}
|
|
41
|
+
],
|
|
42
|
+
"external_commands_planned": [
|
|
43
|
+
{ "phase": "apply", "command_class": "package_manager_install" }
|
|
44
|
+
],
|
|
45
|
+
"network_after_install": "mcp_server_localhost_only",
|
|
46
|
+
"writes_started": false,
|
|
47
|
+
"next_safe_command": "installer apply --from-plan install-plan.json"
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Keep `target` values coarse enough for review. Prefer `claude-code project config` over a full local path, and `package_manager_install` over raw shell output.
|
|
52
|
+
|
|
53
|
+
## Acceptance checks
|
|
54
|
+
|
|
55
|
+
A safe installer should make these claims inspectable:
|
|
56
|
+
|
|
57
|
+
1. **Plan mode exists** — `install --plan`, `install --dry-run`, or equivalent emits the receipt without writing files.
|
|
58
|
+
2. **Effective mode is explicit** — if the user requested `apply` but policy downgraded to `plan`, the receipt says so.
|
|
59
|
+
3. **Agent detection is separated from selection** — finding Cursor/Codex/Claude/OpenClaw does not imply every detected tool will be changed.
|
|
60
|
+
4. **Every planned write has a kind and backup decision** — config, instruction file, Skill, hook, shell profile, lockfile, cache, or generated artifact.
|
|
61
|
+
5. **Writes are still false at receipt time** — `writes_started=false` is the key trust boundary.
|
|
62
|
+
6. **Apply can be repeated from the plan** — the user can review one artifact, then run a concrete next command.
|
|
63
|
+
7. **No private payloads leak** — no raw source, prompts, env dumps, secrets, token values, transcripts, stack traces, or raw tool output.
|
|
64
|
+
|
|
65
|
+
## Why this matters for hooks and MCP
|
|
66
|
+
|
|
67
|
+
Hooks, Skills, and MCP configs are often discussed as runtime supply-chain surfaces. That is true, but it is downstream. A one-command installer can create the hook or MCP entry first.
|
|
68
|
+
|
|
69
|
+
A hook receipt answers: “what executed?”
|
|
70
|
+
|
|
71
|
+
An install-plan receipt answers the earlier question: **“what is about to be installed, written, and trusted?”**
|
|
72
|
+
|
|
73
|
+
If an installer cannot answer that before mutation, treat it like running CI from an untrusted fork: useful, but not automatically safe.
|
|
74
|
+
|
|
75
|
+
## Try the copyable example
|
|
76
|
+
|
|
77
|
+
See [`examples/install-plan-receipts/`](../examples/install-plan-receipts/) for a small review checklist and sample receipt you can copy into setup scripts, README install sections, or agent-managed onboarding workflows.
|
|
78
|
+
|
|
79
|
+
After the installer has run, use [Skill install/load receipts](skill-install-receipts.md) when the next question is whether each target agent can discover/load the installed Skill and whether the install made the first session unsafe by adding too much always-loaded context.
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
# Loaded-resource boundary receipts
|
|
2
|
+
|
|
3
|
+
Use this when a Skill, plugin resource, MCP-provided instruction, or custom-agent file appears to be configured correctly but does not actually reach the agent runtime.
|
|
4
|
+
|
|
5
|
+
This is the failure mode behind reports like:
|
|
6
|
+
|
|
7
|
+
- "the Skill works in chat but not ACP/Zed/CLI";
|
|
8
|
+
- "`/skills` or the skill list is unavailable in this client";
|
|
9
|
+
- "the agent followed generic instructions because the real resource was never injected";
|
|
10
|
+
- "a prompt workaround says resources are preloaded, but there is no proof they were readable by the runtime".
|
|
11
|
+
|
|
12
|
+
Pluribus should not become a Skill manager. The useful boundary is a small receipt that proves what crossed from configuration into the run.
|
|
13
|
+
|
|
14
|
+
## Receipt shape
|
|
15
|
+
|
|
16
|
+
A loaded-resource receipt separates the stages that are often collapsed into "the skill exists":
|
|
17
|
+
|
|
18
|
+
| Stage | Question |
|
|
19
|
+
| --- | --- |
|
|
20
|
+
| `expected` | Which resources did the user/config expect for this agent and task? |
|
|
21
|
+
| `discovered` | Did the host find the resource on disk, in a plugin, registry, or MCP response? |
|
|
22
|
+
| `attached` | Was the resource attached to the selected agent/profile/workspace? |
|
|
23
|
+
| `injected` | Did the runtime put the resource into the model/tool context for this session? |
|
|
24
|
+
| `readable` | Could the agent actually read the resource bytes or resolved prompt? |
|
|
25
|
+
| `skipped` | If not, what precise stage and reason explain the gap? |
|
|
26
|
+
|
|
27
|
+
Recommended privacy-safe fields:
|
|
28
|
+
|
|
29
|
+
```json
|
|
30
|
+
{
|
|
31
|
+
"receipt_type": "pluribus.loaded_resource_boundary.v1",
|
|
32
|
+
"scenario": "custom-agent skill parity across chat and ACP",
|
|
33
|
+
"expected_resources": [
|
|
34
|
+
{
|
|
35
|
+
"id": "skill:pr-review",
|
|
36
|
+
"kind": "skill",
|
|
37
|
+
"scope": "project",
|
|
38
|
+
"source_ref": ".kiro/skills/pr-review/SKILL.md",
|
|
39
|
+
"source_hash": "sha256:...",
|
|
40
|
+
"required": true
|
|
41
|
+
}
|
|
42
|
+
],
|
|
43
|
+
"sessions": [
|
|
44
|
+
{
|
|
45
|
+
"runtime": "chat",
|
|
46
|
+
"client": "kiro-desktop",
|
|
47
|
+
"agent": "reviewer",
|
|
48
|
+
"discovered_resources": ["skill:pr-review"],
|
|
49
|
+
"attached_resources": ["skill:pr-review"],
|
|
50
|
+
"injected_resources": ["skill:pr-review"],
|
|
51
|
+
"readable_resources": ["skill:pr-review"],
|
|
52
|
+
"skipped_resources": []
|
|
53
|
+
},
|
|
54
|
+
{
|
|
55
|
+
"runtime": "acp",
|
|
56
|
+
"client": "zed",
|
|
57
|
+
"agent": "reviewer",
|
|
58
|
+
"discovered_resources": ["skill:pr-review"],
|
|
59
|
+
"attached_resources": ["skill:pr-review"],
|
|
60
|
+
"injected_resources": [],
|
|
61
|
+
"readable_resources": [],
|
|
62
|
+
"skipped_resources": [
|
|
63
|
+
{
|
|
64
|
+
"id": "skill:pr-review",
|
|
65
|
+
"stage": "injected",
|
|
66
|
+
"reason": "runtime_does_not_inject_resources"
|
|
67
|
+
}
|
|
68
|
+
]
|
|
69
|
+
}
|
|
70
|
+
]
|
|
71
|
+
}
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Do not include raw skill text, private prompts, credentials, or full project memory. Hashes, refs, stage names, and skip reasons are enough for a maintainer to reproduce the boundary.
|
|
75
|
+
|
|
76
|
+
## Acceptance test
|
|
77
|
+
|
|
78
|
+
For the same custom agent and the same attached Skill/resource, compare chat vs ACP/CLI/IDE sessions:
|
|
79
|
+
|
|
80
|
+
1. The resource should be `discovered` in each runtime that claims to support it.
|
|
81
|
+
2. If it is attached in chat but not in ACP/Zed/CLI, record `not_attached_to_agent`.
|
|
82
|
+
3. If it is attached but absent from the model context, record `runtime_does_not_inject_resources`.
|
|
83
|
+
4. If it was injected but the bytes cannot be resolved, record `resource_read_failed`.
|
|
84
|
+
5. If trigger logic prevented loading, record `trigger_not_matched` and include the matched task label or hash, not the full prompt.
|
|
85
|
+
|
|
86
|
+
A useful bug report is not "Skills are broken". It is:
|
|
87
|
+
|
|
88
|
+
> For agent `reviewer`, `skill:pr-review` is discovered and attached in both chat and ACP. Chat injects and reads it; ACP/Zed does not inject it and reports `runtime_does_not_inject_resources`.
|
|
89
|
+
|
|
90
|
+
## Try the example
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
node examples/loaded-resource-boundary/check-loaded-resource-boundary.mjs \
|
|
94
|
+
examples/loaded-resource-boundary/loaded-resource-boundary.json
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
The sample intentionally includes a chat-vs-ACP mismatch and treats that mismatch as the useful finding.
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# MCP tool visibility receipts
|
|
2
|
+
|
|
3
|
+
MCP memory, Git, GitLab, code-search, and knowledge-graph servers can be healthy while the agent still cannot see their tools.
|
|
4
|
+
|
|
5
|
+
A useful debug artifact should prove each boundary separately:
|
|
6
|
+
|
|
7
|
+
1. **Server launched** — the configured command starts without leaking env/secrets.
|
|
8
|
+
2. **Handshake completed** — client and server agreed on a protocol version and capabilities.
|
|
9
|
+
3. **Proxy catalog returned** — a direct `tools/list` call returns the expected tool count and names.
|
|
10
|
+
4. **Client catalog visible** — the actual agent UI/runtime exposes the same tools under the expected names.
|
|
11
|
+
5. **Invocation allowed or refused** — the first tool call either runs, or returns an explicit permission/config/schema reason.
|
|
12
|
+
|
|
13
|
+
`server healthy` is not enough. `tools/list` is not enough. The receipt needs to say where the chain stopped.
|
|
14
|
+
|
|
15
|
+
## 60-second probe for any stdio MCP server
|
|
16
|
+
|
|
17
|
+
Replace the command after the pipe with the server command you already configured in Claude Code, Cursor, Codex, OpenClaw, or another MCP client.
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
(
|
|
21
|
+
printf '%s\n' '{"jsonrpc":"2.0","method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"receipt-probe","version":"0.1.0"}},"id":1}'
|
|
22
|
+
printf '%s\n' '{"jsonrpc":"2.0","method":"tools/list","params":{},"id":2}'
|
|
23
|
+
) | your-mcp-server-command
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Record only metadata, not raw prompt/source/tool output:
|
|
27
|
+
|
|
28
|
+
```json
|
|
29
|
+
{
|
|
30
|
+
"kind": "mcp.tool_visibility.receipt",
|
|
31
|
+
"server": "gitlab",
|
|
32
|
+
"server_command_hash": "sha256:...",
|
|
33
|
+
"protocol_version_requested": "2024-11-05",
|
|
34
|
+
"handshake": "ok",
|
|
35
|
+
"proxy_tools_count": 172,
|
|
36
|
+
"proxy_tool_names_sample": ["glab_issue_list", "glab_mr_view"],
|
|
37
|
+
"client": "Claude Code",
|
|
38
|
+
"client_catalog_visible": false,
|
|
39
|
+
"client_tools_count": 0,
|
|
40
|
+
"stopped_at": "client_catalog_visible",
|
|
41
|
+
"privacy": "names/counts only; no args, outputs, tokens, paths, or source snippets"
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Acceptance check
|
|
46
|
+
|
|
47
|
+
For a release or bug report, ask for one small matrix:
|
|
48
|
+
|
|
49
|
+
| Boundary | Evidence | Pass condition |
|
|
50
|
+
| --- | --- | --- |
|
|
51
|
+
| Launch | server command hash + exit/live status | command starts and stays alive long enough for handshake |
|
|
52
|
+
| Handshake | protocol version + capabilities summary | initialized without version/schema mismatch |
|
|
53
|
+
| Proxy catalog | `tools/list` count + stable tool-name sample | expected tools returned directly |
|
|
54
|
+
| Client catalog | client-visible count + naming prefix | same class of tools visible to the agent |
|
|
55
|
+
| First invocation | allowed/refused reason | failure explains permission/config/schema, not silent absence |
|
|
56
|
+
|
|
57
|
+
This shape is intentionally compatible with GitHub/GitLab issue reports and OpenTelemetry-style events. It helps maintainers separate server bugs from client catalog, protocol-version, schema, timeout, and permission bugs without asking users to paste private output.
|
|
58
|
+
|
|
59
|
+
## Why this belongs near Pluribus
|
|
60
|
+
|
|
61
|
+
Pluribus should not become an MCP gateway or memory database. The narrow value is evidence for context boundaries:
|
|
62
|
+
|
|
63
|
+
- generated instruction files prove what static rules were written;
|
|
64
|
+
- memory/search receipts prove what retrieved context was delivered;
|
|
65
|
+
- tool visibility receipts prove whether a configured MCP capability actually crossed into the agent's usable catalog.
|
|
66
|
+
|
|
67
|
+
If a tool is not visible to the agent, the project has no reliable context handoff no matter how healthy the server looks.
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Memory write policy receipts
|
|
2
|
+
|
|
3
|
+
Cross-agent memory tools usually optimize recall: make Claude Code, Codex, Cursor, OpenClaw, ChatGPT, or MCP clients find the same facts later.
|
|
4
|
+
|
|
5
|
+
The adoption risk is different: **who is allowed to write durable memory, under what scope, and with what rollback or review path?**
|
|
6
|
+
|
|
7
|
+
Pluribus should not become another memory server. This receipt is a small governance layer for shared memory systems: every durable memory update is treated like a proposed diff before it becomes trusted context for future agents.
|
|
8
|
+
|
|
9
|
+
## Receipt boundary
|
|
10
|
+
|
|
11
|
+
A memory write receipt should prove:
|
|
12
|
+
|
|
13
|
+
- **source** — where the proposed memory came from, with a hash/ref instead of raw transcript or raw memory body;
|
|
14
|
+
- **scope** — whether the write is repo, project, org, or user scoped;
|
|
15
|
+
- **proposed diff** — adds/updates/supersedes/expires by stable refs and hashes;
|
|
16
|
+
- **write policy** — proposed, approved, rejected, or quarantined; who/what approved it;
|
|
17
|
+
- **lifecycle** — expiry or review date so stale facts do not become immortal;
|
|
18
|
+
- **injection visibility** — future sessions can see which memory was injected;
|
|
19
|
+
- **privacy flags** — no raw prompts, raw tool output, raw memory text, or secrets in the receipt.
|
|
20
|
+
|
|
21
|
+
## 60-second gate
|
|
22
|
+
|
|
23
|
+
The copyable example is in [`examples/memory-write-policy/`](../examples/memory-write-policy/):
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
node examples/memory-write-policy/check-memory-update.mjs \
|
|
27
|
+
examples/memory-write-policy/approved-memory-update.json
|
|
28
|
+
|
|
29
|
+
node examples/memory-write-policy/check-memory-update.mjs \
|
|
30
|
+
examples/memory-write-policy/quarantined-memory-update.json
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
The first passes because the write is approved, scoped, hashed, visible to future sessions, and has a review lifecycle. The second fails because it tries to turn a quarantined, broad user-scoped, private/sensitive update into durable shared memory and includes raw text.
|
|
34
|
+
|
|
35
|
+
## Positioning
|
|
36
|
+
|
|
37
|
+
Memory systems remember. Hooks and workflow engines execute. This receipt answers a narrower review question:
|
|
38
|
+
|
|
39
|
+
> Is this memory update allowed to become durable context for other agents?
|
|
40
|
+
|
|
41
|
+
That makes shared memory safer without requiring the memory provider to expose private content or the agent transcript.
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
# Parallel session review ledger
|
|
2
|
+
|
|
3
|
+
Use this when you run multiple Claude Code, Cursor, Codex, OpenClaw, or terminal-agent sessions in parallel and the bottleneck is no longer starting work — it is deciding whether each result can be trusted, rejected, or safely resumed.
|
|
4
|
+
|
|
5
|
+
The point is not orchestration. A review ledger is a small privacy-safe receipt that lets a human or a follow-up agent answer:
|
|
6
|
+
|
|
7
|
+
- what was this session assigned to do?
|
|
8
|
+
- which files, commands, and external systems was it allowed to touch?
|
|
9
|
+
- what does the agent claim changed?
|
|
10
|
+
- what evidence exists outside the agent summary?
|
|
11
|
+
- which checks are still missing?
|
|
12
|
+
- is the next reviewer allowed to continue, or should they stop and inspect?
|
|
13
|
+
|
|
14
|
+
Do **not** paste raw prompts, source code, secrets, customer data, transcripts, or full tool output into the ledger. Store stable references, hashes, check names, commit ids, redacted paths, and short evidence labels instead.
|
|
15
|
+
|
|
16
|
+
## Minimal receipt
|
|
17
|
+
|
|
18
|
+
```json
|
|
19
|
+
{
|
|
20
|
+
"schema": "pluribus.parallel_session_review_ledger.v1",
|
|
21
|
+
"generated_at": "2026-06-04T19:00:00Z",
|
|
22
|
+
"run": {
|
|
23
|
+
"orchestrator": "human",
|
|
24
|
+
"repo": "redacted-service",
|
|
25
|
+
"coordination_mode": "parallel_sessions"
|
|
26
|
+
},
|
|
27
|
+
"sessions": [
|
|
28
|
+
{
|
|
29
|
+
"id": "session-a",
|
|
30
|
+
"agent": "claude-code",
|
|
31
|
+
"assignment": "update validation for billing webhook retries",
|
|
32
|
+
"branch": "agent/billing-webhook-retry-validation",
|
|
33
|
+
"allowed_scope": {
|
|
34
|
+
"files": ["src/billing/**", "test/billing/**"],
|
|
35
|
+
"commands": ["npm test -- --test-name-pattern=billing"],
|
|
36
|
+
"network": "none"
|
|
37
|
+
},
|
|
38
|
+
"agent_claim": "added retry validation and regression coverage",
|
|
39
|
+
"evidence": [
|
|
40
|
+
{
|
|
41
|
+
"type": "commit",
|
|
42
|
+
"ref": "abc1234"
|
|
43
|
+
},
|
|
44
|
+
{
|
|
45
|
+
"type": "test",
|
|
46
|
+
"name": "billing retry validation",
|
|
47
|
+
"status": "passed"
|
|
48
|
+
}
|
|
49
|
+
],
|
|
50
|
+
"missing_checks": [],
|
|
51
|
+
"privacy_flags": [],
|
|
52
|
+
"state": "complete",
|
|
53
|
+
"safe_next_action": "review_diff"
|
|
54
|
+
}
|
|
55
|
+
]
|
|
56
|
+
}
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Review states
|
|
60
|
+
|
|
61
|
+
| State | Meaning | Required next action |
|
|
62
|
+
| --- | --- | --- |
|
|
63
|
+
| `complete` | Assignment is done and required evidence exists. | Review the diff or merge path normally. |
|
|
64
|
+
| `partial` | Some useful work exists but a required check or boundary is missing. | Continue only after the missing check/scope is resolved. |
|
|
65
|
+
| `blocked` | The agent stopped before a useful handoff. | Reassign or inspect the blocker before continuing. |
|
|
66
|
+
| `unsafe_to_resume` | Scope, privacy, command, or evidence boundaries were violated. | Stop; inspect manually before any follow-up agent uses the result. |
|
|
67
|
+
|
|
68
|
+
## Safe next actions
|
|
69
|
+
|
|
70
|
+
Use constrained verbs so another session does not turn a vague summary into authority:
|
|
71
|
+
|
|
72
|
+
- `review_diff`
|
|
73
|
+
- `run_missing_check`
|
|
74
|
+
- `continue_same_scope`
|
|
75
|
+
- `ask_human`
|
|
76
|
+
- `stop_manual_review`
|
|
77
|
+
|
|
78
|
+
## Copyable checker
|
|
79
|
+
|
|
80
|
+
The example in [`examples/parallel-session-review-ledger/`](../examples/parallel-session-review-ledger/) validates the useful minimum:
|
|
81
|
+
|
|
82
|
+
- every session has an assignment, branch, allowed scope, state, and safe next action;
|
|
83
|
+
- `complete` sessions have evidence and no missing checks;
|
|
84
|
+
- `partial` sessions name missing checks;
|
|
85
|
+
- `unsafe_to_resume` sessions use `stop_manual_review`;
|
|
86
|
+
- sessions with privacy flags cannot be marked complete;
|
|
87
|
+
- no session writes outside its declared file scope.
|
|
88
|
+
|
|
89
|
+
Run it from the repo root:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
node examples/parallel-session-review-ledger/check-parallel-session-review-ledger.mjs examples/parallel-session-review-ledger/parallel-session-review-ledger.json
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Expected output:
|
|
96
|
+
|
|
97
|
+
```text
|
|
98
|
+
parallel session review ledger ok: 3 sessions checked
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## Positioning
|
|
102
|
+
|
|
103
|
+
Parallel agents do not fail only because models are weak. They fail because the human reviewer loses the boundary of each run. A ledger turns "the agent says it is done" into a small resume/reject object: assignment, scope, evidence, missing checks, and safe next action.
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# Phase-boundary contracts for multi-model coding workflows
|
|
2
|
+
|
|
3
|
+
Use this when a coding workflow routes work through phases such as **Explore → Propose → Spec → Design → Tasks → Apply → Verify**, especially when different tools or models handle different phases.
|
|
4
|
+
|
|
5
|
+
The problem is not only “which model is best for this step”. The failure mode is handoff: a plan agent burns context, a build agent receives a lossy summary, a verifier cannot tell which decisions are current, and stale assumptions leak from one phase into the next.
|
|
6
|
+
|
|
7
|
+
A phase-boundary contract makes every transition explicit:
|
|
8
|
+
|
|
9
|
+
- what input context was allowed into the phase;
|
|
10
|
+
- what artifact the phase had to produce;
|
|
11
|
+
- what evidence is required before the next phase may start;
|
|
12
|
+
- what context must not be carried forward;
|
|
13
|
+
- which stop conditions require human review or a fresh phase run.
|
|
14
|
+
|
|
15
|
+
This keeps Pluribus out of the orchestration layer. The workflow runner can be OpenCode, Claude Code, Cursor, OpenClaw, Codex, a local script, or a human checklist. Pluribus supplies the evidence shape.
|
|
16
|
+
|
|
17
|
+
## Contract shape
|
|
18
|
+
|
|
19
|
+
```json
|
|
20
|
+
{
|
|
21
|
+
"schema": "pluribus.phase-boundary-contract.v1",
|
|
22
|
+
"workflowId": "checkout-refactor-2026-06-03",
|
|
23
|
+
"currentPhase": "apply",
|
|
24
|
+
"nextPhase": "verify",
|
|
25
|
+
"allowedInput": [
|
|
26
|
+
{
|
|
27
|
+
"kind": "approved_plan",
|
|
28
|
+
"ref": "plans/checkout-refactor.md",
|
|
29
|
+
"contentHash": "sha256:...",
|
|
30
|
+
"required": true
|
|
31
|
+
}
|
|
32
|
+
],
|
|
33
|
+
"outputArtifact": {
|
|
34
|
+
"kind": "patch",
|
|
35
|
+
"ref": "git:working-tree",
|
|
36
|
+
"contentHash": "sha256:..."
|
|
37
|
+
},
|
|
38
|
+
"evidenceGate": {
|
|
39
|
+
"requiredBeforeNextPhase": ["changed_files", "tests_run", "open_risks", "stop_conditions"],
|
|
40
|
+
"status": "pass"
|
|
41
|
+
},
|
|
42
|
+
"droppedContext": [
|
|
43
|
+
{
|
|
44
|
+
"kind": "exploration_transcript",
|
|
45
|
+
"reason": "not authoritative after approved plan"
|
|
46
|
+
}
|
|
47
|
+
],
|
|
48
|
+
"stopConditions": []
|
|
49
|
+
}
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Minimum fields
|
|
53
|
+
|
|
54
|
+
| Field | Why it exists |
|
|
55
|
+
| --- | --- |
|
|
56
|
+
| `workflowId` | Correlates phase records without storing a transcript. |
|
|
57
|
+
| `currentPhase` / `nextPhase` | Makes the handoff boundary explicit. |
|
|
58
|
+
| `allowedInput[]` | Prevents the next model from inheriting stale scratch context accidentally. |
|
|
59
|
+
| `outputArtifact` | Names the thing this phase produced: plan, spec, task list, patch, review, or verification report. |
|
|
60
|
+
| `evidenceGate.requiredBeforeNextPhase[]` | Forces the phase to prove the minimum facts the next phase depends on. |
|
|
61
|
+
| `droppedContext[]` | Records what intentionally did **not** cross the boundary. |
|
|
62
|
+
| `stopConditions[]` | Lets the workflow stop instead of laundering uncertainty into the next model. |
|
|
63
|
+
|
|
64
|
+
## Apply → Verify is the strictest boundary
|
|
65
|
+
|
|
66
|
+
For coding workflows, the most useful hard gate is often between Apply and Verify. The verifier should receive a compact evidence packet, not a vague “I implemented it” summary:
|
|
67
|
+
|
|
68
|
+
- decision implemented;
|
|
69
|
+
- source/plan hash used;
|
|
70
|
+
- changed files or file-set hash;
|
|
71
|
+
- tests/commands run with pass/fail state;
|
|
72
|
+
- open risks and skipped checks;
|
|
73
|
+
- whether secrets, schema migrations, data writes, or external calls were touched;
|
|
74
|
+
- explicit stop condition if verification cannot be trusted.
|
|
75
|
+
|
|
76
|
+
## Privacy boundary
|
|
77
|
+
|
|
78
|
+
Do not put raw source, prompts, transcripts, secrets, full command output, absolute local paths, or customer data in the contract. Use stable refs, hashes, counts, risk classes, and short non-secret labels.
|
|
79
|
+
|
|
80
|
+
## Try it
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
cd examples/phase-boundary-contract
|
|
84
|
+
node check-phase-boundary.mjs phase-boundary-contract.json
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The checker is intentionally small. It is a copyable acceptance test for workflow builders: if a phase handoff cannot pass this gate, the next model should not pretend it has reliable state.
|