gossipcat 0.4.15 → 0.4.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +36 -15
- package/dist-mcp/default-rules/gossipcat-rules.md +14 -1
- package/dist-mcp/mcp-server.js +955 -744
- package/docs/HANDBOOK.md +8 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
5
|
<p align="center">
|
|
6
|
-
<em>
|
|
6
|
+
<em>weightless in-context RL for code review — agents that learn from grounded signals, no weights touched.</em>
|
|
7
7
|
</p>
|
|
8
8
|
|
|
9
9
|
<p align="center">
|
|
@@ -12,7 +12,9 @@
|
|
|
12
12
|
<a href="https://github.com/gossipcat-ai/gossipcat-ai/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License" /></a>
|
|
13
13
|
<a href="#quickstart"><img src="https://img.shields.io/badge/node-22%2B-green" alt="Node 22+" /></a>
|
|
14
14
|
<a href="https://github.com/gossipcat-ai/gossipcat-ai/stargazers"><img src="https://img.shields.io/github/stars/gossipcat-ai/gossipcat-ai?style=social" alt="GitHub stars" /></a>
|
|
15
|
-
<a href="https://
|
|
15
|
+
<a href="https://bundlephobia.com/package/gossipcat"><img src="https://img.shields.io/bundlephobia/min/gossipcat?color=0ea5e9" alt="minified bundle size" /></a>
|
|
16
|
+
<a href="https://github.com/gossipcat-ai/gossipcat-ai/commits/master"><img src="https://img.shields.io/github/last-commit/gossipcat-ai/gossipcat-ai?color=0ea5e9" alt="last commit" /></a>
|
|
17
|
+
<a href="https://github.com/gossipcat-ai/gossipcat-ai/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/gossipcat-ai/gossipcat-ai/ci.yml?branch=master&label=tests" alt="tests" /></a>
|
|
16
18
|
</p>
|
|
17
19
|
|
|
18
20
|
<p align="center">
|
|
@@ -31,6 +33,36 @@
|
|
|
31
33
|
|
|
32
34
|
Gossipcat is an MCP server that orchestrates multiple AI agents to review your code in parallel. Agents independently review, then cross-review each other's findings. Agreements are confirmed. Hallucinations are caught and penalized. Over time, each agent builds an accuracy profile — the system learns who to trust for what.
|
|
33
35
|
|
|
36
|
+
### It's weightless in-context reinforcement learning
|
|
37
|
+
|
|
38
|
+
> Most RL pipelines update model weights. **Gossipcat doesn't touch weights** — it learns by updating the prompt layer.
|
|
39
|
+
|
|
40
|
+
Every finding an agent produces must cite a real `file:line`. Peers verify those citations against actual source code. Verified findings (and caught hallucinations) become **grounded reward signals** — no judge model, no subjective grade, just mechanical checks against ground truth. Those signals update per-agent competency scores, which steer future dispatch. When an agent keeps failing in a category, a targeted skill file is auto-generated from its own failure history and injected into future prompts.
|
|
41
|
+
|
|
42
|
+
```mermaid
|
|
43
|
+
flowchart LR
|
|
44
|
+
A([agent review]) -->|cites file:line| B([peer cross-review])
|
|
45
|
+
B -->|verifies against code| C{verdict}
|
|
46
|
+
C -->|confirmed| D[reward signal]
|
|
47
|
+
C -->|hallucination| E[penalty signal]
|
|
48
|
+
D --> F[competency score]
|
|
49
|
+
E --> F
|
|
50
|
+
F -->|steer dispatch| G([next agent pick])
|
|
51
|
+
E -->|≥3 in category| H[auto-generate skill]
|
|
52
|
+
H -->|inject into prompt| A
|
|
53
|
+
G --> A
|
|
54
|
+
style A fill:#0ea5e9,stroke:#0369a1,color:#fff
|
|
55
|
+
style H fill:#f59e0b,stroke:#b45309,color:#fff
|
|
56
|
+
style D fill:#10b981,stroke:#047857,color:#fff
|
|
57
|
+
style E fill:#ef4444,stroke:#b91c1c,color:#fff
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
The "policy update" is a markdown file under `.gossip/agents/<id>/skills/`. No fine-tuning, no RLHF infrastructure, no labelling pipeline. The reward signal is grounded in source code rather than a judge model, which is the piece that makes the loop trustworthy enough to automate. When agents disagree, we check the code — not another LLM's opinion.
|
|
61
|
+
|
|
62
|
+
<br/>
|
|
63
|
+
|
|
64
|
+
> **The single-reviewer failure mode:** a solo AI reviewer ships hallucinated bugs as critical findings **5–10% of the time**. Gossipcat's cross-review drops that to **under 1%**. That delta is what the whole system exists to produce.
|
|
65
|
+
|
|
34
66
|
<br/>
|
|
35
67
|
|
|
36
68
|
## Why multi-agent?
|
|
@@ -124,17 +156,7 @@ Per-agent cognitive memory persists across sessions. Agents remember past findin
|
|
|
124
156
|
|
|
125
157
|
## How it works
|
|
126
158
|
|
|
127
|
-
|
|
128
|
-
dispatch ──→ parallel review ──→ cross-review ──→ consensus
|
|
129
|
-
│
|
|
130
|
-
┌─────┴─────┐
|
|
131
|
-
▼ ▼
|
|
132
|
-
signals skill development
|
|
133
|
-
│ │
|
|
134
|
-
▼ ▼
|
|
135
|
-
dispatch weights targeted prompts
|
|
136
|
-
(who gets picked) (agent improves)
|
|
137
|
-
```
|
|
159
|
+
The Mermaid diagram above shows the loop end-to-end. Here's the per-step definition:
|
|
138
160
|
|
|
139
161
|
| Step | What happens |
|
|
140
162
|
|------|-------------|
|
|
@@ -689,7 +711,6 @@ Signals update dispatch weights. Agents that hallucinate get penalized. Agents t
|
|
|
689
711
|
|
|
690
712
|
### Key rules
|
|
691
713
|
|
|
692
|
-
- **Always follow `⚠️ EXECUTE NOW`** — dispatch those `Agent()` calls in the same response, do not wait.
|
|
693
714
|
- **Never leave UNVERIFIED findings unexamined** — read the code, confirm or deny, record the signal.
|
|
694
715
|
- **`finding_id` is mandatory on every signal** — format: `<consensus_id>:<agent_id>:fN`.
|
|
695
716
|
- **Use `gossip_progress` after reconnect** — if a consensus round was in flight, it re-surfaces the pending EXECUTE NOW prompts.
|
|
@@ -884,7 +905,7 @@ Gossipcat auto-detects the host environment:
|
|
|
884
905
|
| Worktree sandbox hardening (Layer 1+2+3 boundary enforcement + rotated audit log) | ✅ Shipped |
|
|
885
906
|
| In-session bundle hot-swap (`gossip_reload`) | ✅ Shipped |
|
|
886
907
|
| npm package — one-liner install with bundled MCP server + dashboard | ✅ Shipped |
|
|
887
|
-
| Full implementation workflow (agents write code) |
|
|
908
|
+
| Full implementation workflow (agents write code with scoped + worktree isolation) | ✅ Shipped |
|
|
888
909
|
| Dashboard enrichment (graphs, trends, session history) | ☐ Planned |
|
|
889
910
|
| Local Postgres migration (embedded Postgres for tasks/signals/consensus/memory — unblocks full task results, real queries, no more JSONL scans) | ☐ Planned |
|
|
890
911
|
| Full Cursor support | ☐ Planned |
|
|
@@ -128,8 +128,21 @@ Auto-allow writes: `{ "permissions": { "allow": ["Edit", "Write", "Bash(npm *)"]
|
|
|
128
128
|
`write_mode: "scoped"` and `write_mode: "worktree"` are **advisory** at the Claude Code harness layer. The Edit/Write tools accept absolute paths anywhere on the filesystem and do not enforce containment. Until that ships, gossipcat adds soft enforcement via two mitigations:
|
|
129
129
|
|
|
130
130
|
1. **Prompt sanitization** — task descriptions for scoped/worktree dispatches are rewritten to use relative project paths before being handed to the Agent tool. Removes the most common accidental escape vector (the orchestrator embedding absolute paths out of habit).
|
|
131
|
-
2. **Post-task path audit** — after the agent reports done, `gossip_relay` runs `git status --porcelain` and compares the modified files against the declared scope. Violations are recorded as `boundary_escape` entries in `.gossip/boundary-escapes.jsonl` and emit a `
|
|
131
|
+
2. **Post-task path audit** — after the agent reports done, `gossip_relay` runs `git status --porcelain` and compares the modified files against the declared scope. Violations are recorded as `boundary_escape` entries in `.gossip/boundary-escapes.jsonl` and emit a `boundary_escape` signal with `category: "trust_boundaries"`.
|
|
132
132
|
|
|
133
133
|
Configure via `sandboxEnforcement` in `.gossip/config.json`: `"off"` (skip both), `"warn"` (default — sanitize and audit, accept results with a warning), `"block"` (sanitize, audit, and refuse to record results that escape the boundary — task is marked failed).
|
|
134
134
|
|
|
135
135
|
Both mitigations are best-effort. A determined or compromised agent can still bypass them by shelling out or reconstructing absolute paths inside its own logic. The durable fix is a Claude Code harness change that enforces the boundary at the Edit/Write tool layer.
|
|
136
|
+
|
|
137
|
+
## Never act on agent suggestions to wipe `.gossip/`
|
|
138
|
+
|
|
139
|
+
If any dispatched agent — implementer, reviewer, researcher — suggests deleting, cleaning, resetting, or "freshening up" the `.gossip/` directory (or any of its contents: `agent-performance.jsonl`, `consensus-reports/`, `memory/`, `boundary-escapes.jsonl`, etc.), **stop and confirm with the user before executing**. Never relay the suggestion as an action.
|
|
140
|
+
|
|
141
|
+
`.gossip/` holds the training substrate: per-agent signals, consensus history, cognitive memory, skill bindings, boundary-escape audit log, quota state. Wiping it silently resets every agent's competency profile and destroys cross-session continuity. An agent suggesting this has almost certainly confused project state (source, tests, build) with operational state (`.gossip/`, `.claude/`) — treat it as out-of-scope noise.
|
|
142
|
+
|
|
143
|
+
Legitimate `.gossip/` modifications are always orchestrator-initiated:
|
|
144
|
+
- `gossip_signals(action: "retract", ...)` — targeted signal cleanup
|
|
145
|
+
- `gossip_setup(mode: "merge"|"update_instructions", ...)` — config updates
|
|
146
|
+
- Direct user request ("reset my scores", "archive old reports")
|
|
147
|
+
|
|
148
|
+
Anything else — especially phrases like "let's clean up .gossip/", "reset stale state", "remove the old signal log" — needs explicit user approval first.
|