gossipcat 0.4.18 → 0.4.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,195 @@
1
+ ---
2
+ name: emit-structured-claims
3
+ description: Emit a structured premise-claims JSON block alongside prose findings so the orchestrator can grep-verify code-shape claims and close the four Stage-1 bypass classes.
4
+ keywords: []
5
+ category: trust_boundaries
6
+ mode: permanent
7
+ status: active
8
+ ---
9
+
10
+ ## When this skill activates
11
+ Always — this is a permanent-mode skill for investigation agents
12
+ (`-researcher`, `-reviewer`). Your finding must include a `premise-claims`
13
+ JSON block whenever it contains ANY of:
14
+
15
+ 1. **A file path + line number** — *"at `apps/cli/src/handlers/dispatch.ts:42`"*.
16
+ 2. **A count of callers / sites / handlers / files** — *"5 sites call X"*,
17
+ *"3 handlers emit Y"*, *"2 files reference Z"*.
18
+ 3. **A claim of absence** — *"X is not called"*, *"no handler emits Y"*,
19
+ *"Z is missing from scope"*.
20
+ 4. **Any bypass-class phrase** — *"several"*, *"a few"*, *"most"*,
21
+ *"probably"*, *"I think"*, *"it looks like"*, *"all but"*, *"only N of M"*,
22
+ *"none of"*. These are the vague / hedged / inverted classes Stage 1
23
+ regex cannot see; structured claims are the ONLY way they get verified.
24
+
25
+ ## Iron law
26
+ Prose alone is not verifiable. If your finding rests on code shape — counts,
27
+ line numbers, presence, absence — emit it as a structured claim too. The
28
+ orchestrator will `rg`-check every claim before dispatch. Claims that
29
+ falsify mean your premise is wrong; ship the claim block and let the
30
+ pipeline catch the mistake before an implementer inherits it.
31
+
32
+ ## How to emit
33
+ Append a fenced code block with the `premise-claims` info string AFTER your
34
+ prose finding and BEFORE any suggested code. The block is JSON with a
35
+ `schema_version`, `verifier`, and `claims` array. Spec:
36
+ `docs/specs/2026-04-22-premise-verification-stage-2.md`.
37
+
38
+ ### Worked example
39
+
40
+ Given finding prose like:
41
+
42
+ > *"Five dispatch sites call `assembleUtilityPrompt` in
43
+ > `apps/cli/src/mcp-server-sdk.ts`; none of them pass the sentinel flag.
44
+ > `maybeAnnotateUnverifiedClaims` lives at `apps/cli/src/sandbox.ts:207`."*
45
+
46
+ Emit alongside:
47
+
48
+ ```premise-claims
49
+ {
50
+ "schema_version": "1",
51
+ "verifier": "orchestrator",
52
+ "claims": [
53
+ { "type": "callsite_count", "symbol": "assembleUtilityPrompt",
54
+ "scope": "apps/cli/src/mcp-server-sdk.ts", "expected": 5,
55
+ "modality": "asserted" },
56
+ { "type": "absence_of_symbol", "symbol": "SCOPE_NOTE",
57
+ "scope": "apps/cli/src/handlers/dispatch.ts",
58
+ "context": "preamble emission path", "modality": "asserted" },
59
+ { "type": "file_line", "path": "apps/cli/src/sandbox.ts",
60
+ "line": 207, "expected_symbol": "maybeAnnotateUnverifiedClaims",
61
+ "modality": "asserted" }
62
+ ]
63
+ }
64
+ ```
65
+
66
+ ## Claim types (v1 — see spec for full table)
67
+
68
+ - `callsite_count` — `symbol`, `scope`, `expected` (int). Verifier sums
69
+ `rg --count-matches` across scope; compares against `expected`.
70
+ - `file_line` — `path`, `line` (int), `expected_symbol` (string). Verifier
71
+ reads the file ±2 lines and string-matches `expected_symbol`.
72
+ - `absence_of_symbol` — `symbol`, `scope`, `context` (≤120 chars).
73
+ Verifier requires `rg --count-matches` summed total of 0.
74
+ - `presence_of_symbol` — `symbol`, `scope`. Verifier requires total ≥ 1.
75
+ - `count_relation` — `symbol`, `scope`, `relation` (`>`, `<`, `=`, `≥`, `≤`),
76
+ `value` (int). Use this for SUBSET forms like *"all but one of 5"* or
77
+ *"only 1 of 5 is safe"* — NOT `negated:true`, which only encodes simple
78
+ inequality against a total (bypass class I).
79
+
80
+ ## Modality (required on every claim)
81
+
82
+ Classify your confidence at emit time — the verifier uses modality to scale
83
+ the falsification penalty.
84
+
85
+ - `asserted` — *"5 sites call X"* — you checked; verifier runs full strictness.
86
+ - `hedged` — *"~5 sites call X; not re-checked"* — you saw a number but
87
+ didn't re-verify. Verifier still runs, falsification penalty is halved.
88
+ - `vague` — *"several sites call X"* — no numeric commitment. Pair with
89
+ `range_hint: { min, max }` only when you have grounded reason to believe
90
+ a range. If `range_hint` is absent, the verifier records the observed
91
+ count but returns `unverifiable_by_grep` — it will NOT fabricate a
92
+ falsification. Omitting `range_hint` is the honest choice when you do
93
+ not know the count; inventing bounds to satisfy the schema is worse
94
+ than staying prose-vague.
95
+
96
+ Omitting the `modality` field entirely is a schema-lint warning; the
97
+ verifier treats a missing field as `asserted` (strictest path) and logs
98
+ the violation. Always include `modality` explicitly.
99
+
100
+ ### Uncertain about a line number? Use `presence_of_symbol` — scoped to the same file.
101
+
102
+ Anchor mismatches (wrong line, or worse, wrong file) are the dominant
103
+ observed failure mode. If you are not directly looking at the cited line
104
+ as you write the claim, **do not use `file_line`** — a guessed line
105
+ becomes a fabricated citation, which is the most expensive kind of
106
+ hallucination.
107
+
108
+ The safe fallback is `presence_of_symbol` scoped to **the specific file
109
+ you believe the symbol lives in** — not a directory. A directory scope
110
+ passes whenever the symbol appears anywhere in the subtree (including
111
+ tests, stale comments, unrelated modules), which turns a line-number
112
+ error into a silent wrong-file error. Same-file scope preserves location
113
+ information even though you've dropped line precision.
114
+
115
+ - ✅ *"`maybeAnnotate` is in `sandbox.ts`"* (line unknown) →
116
+ `{ type: "presence_of_symbol", symbol: "maybeAnnotate", scope: "apps/cli/src/sandbox.ts", modality: "asserted" }`
117
+ - ✅ *"`maybeAnnotate` is at `sandbox.ts:207`"* (looking at line 207) →
118
+ `{ type: "file_line", path: "apps/cli/src/sandbox.ts", line: 207, expected_symbol: "maybeAnnotate", modality: "asserted" }`
119
+ - ❌ *"`maybeAnnotate` lives around line 290 of `sandbox.ts`"* (guessed line)
120
+ → fabricated citation. Drop the line, keep the file: use `presence_of_symbol` with `scope: "apps/cli/src/sandbox.ts"`.
121
+ - ❌ *"`maybeAnnotate` is somewhere in `apps/cli/src/`"* (directory scope)
122
+ → masks wrong-file errors. If you don't know the file either, the honest choice is prose with `modality: "vague"` and no claim — not a directory-scoped presence check.
123
+
124
+ ## When to use `count_relation` vs `negated:true`
125
+
126
+ - `negated: true` on `callsite_count` — encodes "observed count ≠ expected".
127
+ Use only for simple inequality claims.
128
+ - `count_relation` — use for subset forms. *"All but one of the 5 sites"* →
129
+ `{ type: "count_relation", symbol: "...", scope: "...", relation: "=",
130
+ value: 4 }` (4 of 5). *"Only 2 of 5 are safe"* → `relation: "="`,
131
+ `value: 2`. *"None of 5"* → `relation: "="`, `value: 0`.
132
+
133
+ ## Compound sentences → multiple claim objects
134
+
135
+ One prose sentence can pack count + file:line + absence. Decompose into N
136
+ separate claim objects in the `claims` array. The verifier reports
137
+ per-claim outcomes; dispatch annotation cites only the subset that failed
138
+ or could not be verified.
139
+
140
+ ### Example
141
+
142
+ Prose: *"`persistRelayTasks` is called 3× in `dispatch.ts` and 1× in
143
+ `collect.ts`; `doBoot` at `mcp-server-sdk.ts:465` calls `restoreNativeTaskMap`."*
144
+
145
+ The realistic compound failure is NOT a malformed "X and Y" symbol — it's
146
+ **silent partial coverage**: emitting one claim, treating the rest as
147
+ covered by prose.
148
+
149
+ ❌ BAD (only the easiest claim emitted; two load-bearing assertions slip
150
+ through as unverified prose):
151
+ ```json
152
+ { "type": "callsite_count", "symbol": "persistRelayTasks",
153
+ "scope": "apps/cli/src/handlers/dispatch.ts", "expected": 3, "modality": "asserted" }
154
+ ```
155
+
156
+ ✅ GOOD (one claim per prose assertion; none of the three load-bearing
157
+ pieces can slip through without verification):
158
+ ```json
159
+ { "type": "callsite_count", "symbol": "persistRelayTasks",
160
+ "scope": "apps/cli/src/handlers/dispatch.ts", "expected": 3, "modality": "asserted" }
161
+ { "type": "callsite_count", "symbol": "persistRelayTasks",
162
+ "scope": "apps/cli/src/handlers/collect.ts", "expected": 1, "modality": "asserted" }
163
+ { "type": "file_line", "path": "apps/cli/src/mcp-server-sdk.ts", "line": 465,
164
+ "expected_symbol": "restoreNativeTaskMap", "modality": "asserted" }
165
+ ```
166
+
167
+ If a prose sentence has N verifiable assertions and your `claims[]`
168
+ array has fewer than N entries, the missing assertions become unverified
169
+ prose — no penalty when wrong, no signal when right.
170
+
171
+ ## Anti-patterns
172
+
173
+ - **Do NOT fabricate counts** to satisfy the schema. If you did not grep,
174
+ use `modality: "vague"` without `range_hint`. A `vague`-unverifiable
175
+ verdict is 0× penalty; a fabricated `asserted` that falsifies is 3×.
176
+ - **Do NOT emit claims for untestable scopes.** The verifier runs local
177
+ `rg` inside the project root; claims about remote repos, runtime
178
+ behavior, or semantic intent cannot be verified and will return
179
+ `unverifiable_by_grep`. Keep those as prose.
180
+ - **Do NOT omit modality.** Missing field triggers a schema-lint warning
181
+ and is scored as `asserted` (strictest) — you lose the hedge discount.
182
+ - **Do NOT combine `absence_of_symbol` with `negated: true`.** That is
183
+ just `presence_of_symbol`; schema-lint rejects on emit.
184
+
185
+ ## Why this skill exists
186
+
187
+ Stage 1 regex catches literal-numeral + TARGETS-noun patterns over prose
188
+ ("5 sites call X"). It silently passes four bypass classes: **vague**
189
+ (*"several sites"* — no numeric anchor), **hedged** (*"probably 5"* — no
190
+ uncertainty marker to the regex), **inverted** (*"all but 1 of 5"* — the
191
+ count matches, the semantically load-bearing negation doesn't), and
192
+ **compound** (one sentence, three claims — regex fires once). Structured
193
+ claims give each of these an explicit schema representation. See
194
+ `docs/specs/2026-04-22-premise-verification-stage-2.md` for the full
195
+ rationale and the adoption plan toward Stage 1 sunset.
@@ -0,0 +1,41 @@
1
+ ---
2
+ name: verify-the-premise
3
+ description: Grep-verify quantitative claims in the dispatch task BEFORE writing code.
4
+ keywords: []
5
+ category: verification
6
+ mode: permanent
7
+ status: active
8
+ ---
9
+
10
+ ## When this skill activates
11
+ Always — this is a permanent-mode skill for implementer agents.
12
+
13
+ ## Iron law
14
+ Before writing the first line of code, grep every quantitative or structural
15
+ claim in the dispatch task. If grep disagrees, emit `hallucination_caught`
16
+ with your measured count and stop — do not proceed on a false premise.
17
+
18
+ ## Checklist
19
+ 1. Extract claims of shape "N sites/callers/handlers", "lacks X",
20
+ "missing from Y", "at file:line Z" from the task description.
21
+ 2. For each claim, run the minimal grep that would disprove it:
22
+ - "5 sites call foo()" → grep -c "foo(" in the cited file.
23
+ - "lacks the bar helper" → grep "function bar" or "const bar = " in scope.
24
+ - "at file:N" → read file at offset N and quote what's there.
25
+ 3. Record the grep output inline in your first <agent_finding>.
26
+ 4. If mismatch: emit a <agent_finding type="finding" severity="high"> tagged
27
+ `premise_mismatch` with the measured count vs claimed count. Stop there.
28
+ 5. If match: proceed to implementation with confidence.
29
+
30
+ ## Grep budget
31
+ - Max 5 greps per task (premise verification should be cheap).
32
+ - Each grep capped at 2 seconds (the skill guide includes a timeout hint).
33
+ - If the claim is unverifiable cheaply (e.g. "all handlers in the monorepo"),
34
+ downgrade confidence and proceed with explicit uncertainty note.
35
+
36
+ ## Anti-patterns
37
+ - Don't run the full test suite as "verification" — tests measure symptoms,
38
+ not premises. A passing test can coexist with a false premise (see the
39
+ 2026-04-22 incident: 2449 tests passed, premise was wrong).
40
+ - Don't skip premise verification under time pressure — Cost(verify) ≤ 10s,
41
+ Cost(shipping a wrong fix) = another consensus round + rework.