gossipcat 0.4.18 → 0.4.19
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist-mcp/default-skills/emit-structured-claims.md +195 -0
- package/dist-mcp/default-skills/verify-the-premise.md +41 -0
- package/dist-mcp/mcp-server.js +17664 -16810
- package/docs/HANDBOOK.md +10 -0
- package/docs/RULES.md +208 -0
- package/package.json +2 -1
- package/scripts/postinstall.js +7 -4
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: emit-structured-claims
|
|
3
|
+
description: Emit a structured premise-claims JSON block alongside prose findings so the orchestrator can grep-verify code-shape claims and close the four Stage-1 bypass classes.
|
|
4
|
+
keywords: []
|
|
5
|
+
category: trust_boundaries
|
|
6
|
+
mode: permanent
|
|
7
|
+
status: active
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## When this skill activates
|
|
11
|
+
Always — this is a permanent-mode skill for investigation agents
|
|
12
|
+
(`-researcher`, `-reviewer`). Your finding must include a `premise-claims`
|
|
13
|
+
JSON block whenever it contains ANY of:
|
|
14
|
+
|
|
15
|
+
1. **A file path + line number** — *"at `apps/cli/src/handlers/dispatch.ts:42`"*.
|
|
16
|
+
2. **A count of callers / sites / handlers / files** — *"5 sites call X"*,
|
|
17
|
+
*"3 handlers emit Y"*, *"2 files reference Z"*.
|
|
18
|
+
3. **A claim of absence** — *"X is not called"*, *"no handler emits Y"*,
|
|
19
|
+
*"Z is missing from scope"*.
|
|
20
|
+
4. **Any bypass-class phrase** — *"several"*, *"a few"*, *"most"*,
|
|
21
|
+
*"probably"*, *"I think"*, *"it looks like"*, *"all but"*, *"only N of M"*,
|
|
22
|
+
*"none of"*. These are the vague / hedged / inverted classes Stage 1
|
|
23
|
+
regex cannot see; structured claims are the ONLY way they get verified.
|
|
24
|
+
|
|
25
|
+
## Iron law
|
|
26
|
+
Prose alone is not verifiable. If your finding rests on code shape — counts,
|
|
27
|
+
line numbers, presence, absence — emit it as a structured claim too. The
|
|
28
|
+
orchestrator will `rg`-check every claim before dispatch. Claims that
|
|
29
|
+
falsify mean your premise is wrong; ship the claim block and let the
|
|
30
|
+
pipeline catch the mistake before an implementer inherits it.
|
|
31
|
+
|
|
32
|
+
## How to emit
|
|
33
|
+
Append a fenced code block with the `premise-claims` info string AFTER your
|
|
34
|
+
prose finding and BEFORE any suggested code. The block is JSON with a
|
|
35
|
+
`schema_version`, `verifier`, and `claims` array. Spec:
|
|
36
|
+
`docs/specs/2026-04-22-premise-verification-stage-2.md`.
|
|
37
|
+
|
|
38
|
+
### Worked example
|
|
39
|
+
|
|
40
|
+
Given finding prose like:
|
|
41
|
+
|
|
42
|
+
> *"Five dispatch sites call `assembleUtilityPrompt` in
|
|
43
|
+
> `apps/cli/src/mcp-server-sdk.ts`; none of them pass the sentinel flag.
|
|
44
|
+
> `maybeAnnotateUnverifiedClaims` lives at `apps/cli/src/sandbox.ts:207`."*
|
|
45
|
+
|
|
46
|
+
Emit alongside:
|
|
47
|
+
|
|
48
|
+
```premise-claims
|
|
49
|
+
{
|
|
50
|
+
"schema_version": "1",
|
|
51
|
+
"verifier": "orchestrator",
|
|
52
|
+
"claims": [
|
|
53
|
+
{ "type": "callsite_count", "symbol": "assembleUtilityPrompt",
|
|
54
|
+
"scope": "apps/cli/src/mcp-server-sdk.ts", "expected": 5,
|
|
55
|
+
"modality": "asserted" },
|
|
56
|
+
{ "type": "absence_of_symbol", "symbol": "SCOPE_NOTE",
|
|
57
|
+
"scope": "apps/cli/src/handlers/dispatch.ts",
|
|
58
|
+
"context": "preamble emission path", "modality": "asserted" },
|
|
59
|
+
{ "type": "file_line", "path": "apps/cli/src/sandbox.ts",
|
|
60
|
+
"line": 207, "expected_symbol": "maybeAnnotateUnverifiedClaims",
|
|
61
|
+
"modality": "asserted" }
|
|
62
|
+
]
|
|
63
|
+
}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Claim types (v1 — see spec for full table)
|
|
67
|
+
|
|
68
|
+
- `callsite_count` — `symbol`, `scope`, `expected` (int). Verifier sums
|
|
69
|
+
`rg --count-matches` across scope; compares against `expected`.
|
|
70
|
+
- `file_line` — `path`, `line` (int), `expected_symbol` (string). Verifier
|
|
71
|
+
reads the file ±2 lines and string-matches `expected_symbol`.
|
|
72
|
+
- `absence_of_symbol` — `symbol`, `scope`, `context` (≤120 chars).
|
|
73
|
+
Verifier requires `rg --count-matches` summed total of 0.
|
|
74
|
+
- `presence_of_symbol` — `symbol`, `scope`. Verifier requires total ≥ 1.
|
|
75
|
+
- `count_relation` — `symbol`, `scope`, `relation` (`>`, `<`, `=`, `≥`, `≤`),
|
|
76
|
+
`value` (int). Use this for SUBSET forms like *"all but one of 5"* or
|
|
77
|
+
*"only 1 of 5 is safe"* — NOT `negated:true`, which only encodes simple
|
|
78
|
+
inequality against a total (bypass class I).
|
|
79
|
+
|
|
80
|
+
## Modality (required on every claim)
|
|
81
|
+
|
|
82
|
+
Classify your confidence at emit time — the verifier uses modality to scale
|
|
83
|
+
the falsification penalty.
|
|
84
|
+
|
|
85
|
+
- `asserted` — *"5 sites call X"* — you checked; verifier runs full strictness.
|
|
86
|
+
- `hedged` — *"~5 sites call X; not re-checked"* — you saw a number but
|
|
87
|
+
didn't re-verify. Verifier still runs, falsification penalty is halved.
|
|
88
|
+
- `vague` — *"several sites call X"* — no numeric commitment. Pair with
|
|
89
|
+
`range_hint: { min, max }` only when you have grounded reason to believe
|
|
90
|
+
a range. If `range_hint` is absent, the verifier records the observed
|
|
91
|
+
count but returns `unverifiable_by_grep` — it will NOT fabricate a
|
|
92
|
+
falsification. Omitting `range_hint` is the honest choice when you do
|
|
93
|
+
not know the count; inventing bounds to satisfy the schema is worse
|
|
94
|
+
than staying prose-vague.
|
|
95
|
+
|
|
96
|
+
Omitting the `modality` field entirely is a schema-lint warning; the
|
|
97
|
+
verifier treats a missing field as `asserted` (strictest path) and logs
|
|
98
|
+
the violation. Always include `modality` explicitly.
|
|
99
|
+
|
|
100
|
+
### Uncertain about a line number? Use `presence_of_symbol` — scoped to the same file.
|
|
101
|
+
|
|
102
|
+
Anchor mismatches (wrong line, or worse, wrong file) are the dominant
|
|
103
|
+
observed failure mode. If you are not directly looking at the cited line
|
|
104
|
+
as you write the claim, **do not use `file_line`** — a guessed line
|
|
105
|
+
becomes a fabricated citation, which is the most expensive kind of
|
|
106
|
+
hallucination.
|
|
107
|
+
|
|
108
|
+
The safe fallback is `presence_of_symbol` scoped to **the specific file
|
|
109
|
+
you believe the symbol lives in** — not a directory. A directory scope
|
|
110
|
+
passes whenever the symbol appears anywhere in the subtree (including
|
|
111
|
+
tests, stale comments, unrelated modules), which turns a line-number
|
|
112
|
+
error into a silent wrong-file error. Same-file scope preserves location
|
|
113
|
+
information even though you've dropped line precision.
|
|
114
|
+
|
|
115
|
+
- ✅ *"`maybeAnnotate` is in `sandbox.ts`"* (line unknown) →
|
|
116
|
+
`{ type: "presence_of_symbol", symbol: "maybeAnnotate", scope: "apps/cli/src/sandbox.ts", modality: "asserted" }`
|
|
117
|
+
- ✅ *"`maybeAnnotate` is at `sandbox.ts:207`"* (looking at line 207) →
|
|
118
|
+
`{ type: "file_line", path: "apps/cli/src/sandbox.ts", line: 207, expected_symbol: "maybeAnnotate", modality: "asserted" }`
|
|
119
|
+
- ❌ *"`maybeAnnotate` lives around line 290 of `sandbox.ts`"* (guessed line)
|
|
120
|
+
→ fabricated citation. Drop the line, keep the file: use `presence_of_symbol` with `scope: "apps/cli/src/sandbox.ts"`.
|
|
121
|
+
- ❌ *"`maybeAnnotate` is somewhere in `apps/cli/src/`"* (directory scope)
|
|
122
|
+
→ masks wrong-file errors. If you don't know the file either, the honest choice is prose with `modality: "vague"` and no claim — not a directory-scoped presence check.
|
|
123
|
+
|
|
124
|
+
## When to use `count_relation` vs `negated:true`
|
|
125
|
+
|
|
126
|
+
- `negated: true` on `callsite_count` — encodes "observed count ≠ expected".
|
|
127
|
+
Use only for simple inequality claims.
|
|
128
|
+
- `count_relation` — use for subset forms. *"All but one of the 5 sites"* →
|
|
129
|
+
`{ type: "count_relation", symbol: "...", scope: "...", relation: "=",
|
|
130
|
+
value: 4 }` (4 of 5). *"Only 2 of 5 are safe"* → `relation: "="`,
|
|
131
|
+
`value: 2`. *"None of 5"* → `relation: "="`, `value: 0`.
|
|
132
|
+
|
|
133
|
+
## Compound sentences → multiple claim objects
|
|
134
|
+
|
|
135
|
+
One prose sentence can pack count + file:line + absence. Decompose into N
|
|
136
|
+
separate claim objects in the `claims` array. The verifier reports
|
|
137
|
+
per-claim outcomes; dispatch annotation cites only the subset that failed
|
|
138
|
+
or could not be verified.
|
|
139
|
+
|
|
140
|
+
### Example
|
|
141
|
+
|
|
142
|
+
Prose: *"`persistRelayTasks` is called 3× in `dispatch.ts` and 1× in
|
|
143
|
+
`collect.ts`; `doBoot` at `mcp-server-sdk.ts:465` calls `restoreNativeTaskMap`."*
|
|
144
|
+
|
|
145
|
+
The realistic compound failure is NOT a malformed "X and Y" symbol — it's
|
|
146
|
+
**silent partial coverage**: emitting one claim, treating the rest as
|
|
147
|
+
covered by prose.
|
|
148
|
+
|
|
149
|
+
❌ BAD (only the easiest claim emitted; two load-bearing assertions slip
|
|
150
|
+
through as unverified prose):
|
|
151
|
+
```json
|
|
152
|
+
{ "type": "callsite_count", "symbol": "persistRelayTasks",
|
|
153
|
+
"scope": "apps/cli/src/handlers/dispatch.ts", "expected": 3, "modality": "asserted" }
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
✅ GOOD (one claim per prose assertion; none of the three load-bearing
|
|
157
|
+
pieces can slip through without verification):
|
|
158
|
+
```json
|
|
159
|
+
{ "type": "callsite_count", "symbol": "persistRelayTasks",
|
|
160
|
+
"scope": "apps/cli/src/handlers/dispatch.ts", "expected": 3, "modality": "asserted" }
|
|
161
|
+
{ "type": "callsite_count", "symbol": "persistRelayTasks",
|
|
162
|
+
"scope": "apps/cli/src/handlers/collect.ts", "expected": 1, "modality": "asserted" }
|
|
163
|
+
{ "type": "file_line", "path": "apps/cli/src/mcp-server-sdk.ts", "line": 465,
|
|
164
|
+
"expected_symbol": "restoreNativeTaskMap", "modality": "asserted" }
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
If a prose sentence has N verifiable assertions and your `claims[]`
|
|
168
|
+
array has fewer than N entries, the missing assertions become unverified
|
|
169
|
+
prose — no penalty when wrong, no signal when right.
|
|
170
|
+
|
|
171
|
+
## Anti-patterns
|
|
172
|
+
|
|
173
|
+
- **Do NOT fabricate counts** to satisfy the schema. If you did not grep,
|
|
174
|
+
use `modality: "vague"` without `range_hint`. A `vague`-unverifiable
|
|
175
|
+
verdict is 0× penalty; a fabricated `asserted` that falsifies is 3×.
|
|
176
|
+
- **Do NOT emit claims for untestable scopes.** The verifier runs local
|
|
177
|
+
`rg` inside the project root; claims about remote repos, runtime
|
|
178
|
+
behavior, or semantic intent cannot be verified and will return
|
|
179
|
+
`unverifiable_by_grep`. Keep those as prose.
|
|
180
|
+
- **Do NOT omit modality.** Missing field triggers a schema-lint warning
|
|
181
|
+
and is scored as `asserted` (strictest) — you lose the hedge discount.
|
|
182
|
+
- **Do NOT combine `absence_of_symbol` with `negated: true`.** That is
|
|
183
|
+
just `presence_of_symbol`; schema-lint rejects on emit.
|
|
184
|
+
|
|
185
|
+
## Why this skill exists
|
|
186
|
+
|
|
187
|
+
Stage 1 regex catches literal-numeral + TARGETS-noun patterns over prose
|
|
188
|
+
("5 sites call X"). It silently passes four bypass classes: **vague**
|
|
189
|
+
(*"several sites"* — no numeric anchor), **hedged** (*"probably 5"* — no
|
|
190
|
+
uncertainty marker to the regex), **inverted** (*"all but 1 of 5"* — the
|
|
191
|
+
count matches, the semantically load-bearing negation doesn't), and
|
|
192
|
+
**compound** (one sentence, three claims — regex fires once). Structured
|
|
193
|
+
claims give each of these an explicit schema representation. See
|
|
194
|
+
`docs/specs/2026-04-22-premise-verification-stage-2.md` for the full
|
|
195
|
+
rationale and the adoption plan toward Stage 1 sunset.
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: verify-the-premise
|
|
3
|
+
description: Grep-verify quantitative claims in the dispatch task BEFORE writing code.
|
|
4
|
+
keywords: []
|
|
5
|
+
category: verification
|
|
6
|
+
mode: permanent
|
|
7
|
+
status: active
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## When this skill activates
|
|
11
|
+
Always — this is a permanent-mode skill for implementer agents.
|
|
12
|
+
|
|
13
|
+
## Iron law
|
|
14
|
+
Before writing the first line of code, grep every quantitative or structural
|
|
15
|
+
claim in the dispatch task. If grep disagrees, emit `hallucination_caught`
|
|
16
|
+
with your measured count and stop — do not proceed on a false premise.
|
|
17
|
+
|
|
18
|
+
## Checklist
|
|
19
|
+
1. Extract claims of shape "N sites/callers/handlers", "lacks X",
|
|
20
|
+
"missing from Y", "at file:line Z" from the task description.
|
|
21
|
+
2. For each claim, run the minimal grep that would disprove it:
|
|
22
|
+
- "5 sites call foo()" → grep -c "foo(" in the cited file.
|
|
23
|
+
- "lacks the bar helper" → grep "function bar" or "const bar = " in scope.
|
|
24
|
+
- "at file:N" → read file at offset N and quote what's there.
|
|
25
|
+
3. Record the grep output inline in your first <agent_finding>.
|
|
26
|
+
4. If mismatch: emit a <agent_finding type="finding" severity="high"> tagged
|
|
27
|
+
`premise_mismatch` with the measured count vs claimed count. Stop there.
|
|
28
|
+
5. If match: proceed to implementation with confidence.
|
|
29
|
+
|
|
30
|
+
## Grep budget
|
|
31
|
+
- Max 5 greps per task (premise verification should be cheap).
|
|
32
|
+
- Each grep capped at 2 seconds (the skill guide includes a timeout hint).
|
|
33
|
+
- If the claim is unverifiable cheaply (e.g. "all handlers in the monorepo"),
|
|
34
|
+
downgrade confidence and proceed with explicit uncertainty note.
|
|
35
|
+
|
|
36
|
+
## Anti-patterns
|
|
37
|
+
- Don't run the full test suite as "verification" — tests measure symptoms,
|
|
38
|
+
not premises. A passing test can coexist with a false premise (see the
|
|
39
|
+
2026-04-22 incident: 2449 tests passed, premise was wrong).
|
|
40
|
+
- Don't skip premise verification under time pressure — Cost(verify) ≤ 10s,
|
|
41
|
+
Cost(shipping a wrong fix) = another consensus round + rework.
|