pi-crew 0.8.3 → 0.8.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,48 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.8.4] — cold-verifier agent (T9) (2026-06-16)
4
+
5
+ Second APPLIED technique from the pi-ecosystem distillation (piolium /
6
+ Vigolium — cold-verifier pattern). Adds a new builtin agent whose value is
7
+ **independence**: it re-derives claims from ground truth WITHOUT trusting
8
+ prior reviewer/verifier analysis, breaking the confirmation-bias drift the
9
+ chained `reviewer` → `verifier` path can introduce.
10
+
11
+ ### Why
12
+ piolium splits security verification across ~10 narrow agents, including a
13
+ `cold-verifier` whose prompt enforces file-access isolation ("MUST NOT read
14
+ any file other than the single finding draft"). pi-crew's default `verifier`
15
+ instead *correlates* findings against reviewer output ("Trust dependency
16
+ context") — efficient, but it inherits the reviewer's blind spots. There was
17
+ **no** adversarial cross-check agent (confirmed: zero agents reference
18
+ cold/isolation/unbiased semantics).
19
+
20
+ ### What
21
+ NEW builtin `cold-verifier` agent (`agents/cold-verifier.md`):
22
+ - Read-only + `bash` (runs tests fresh, reads its OWN output — never a
23
+ cached prior-worker log).
24
+ - Prompt-enforced isolation discipline: don't trust prior findings, treat
25
+ each as an *unverified hypothesis*, actively look for contradicting evidence.
26
+ - Distinct `COLD_VERIFICATION` output block with a `CLAIMS_REFUTED` field
27
+ (the highest-value output — inherited claims your independent check
28
+ contradicts).
29
+ - `maxTurns: 12` (tighter than verifier's 15 — it's a focused cross-check).
30
+
31
+ Use `verifier` for fast finding-correlation; use `cold-verifier` when the
32
+ cost of a wrong "PASS" is high (security changes, release gates, data-loss
33
+ paths). Both can run in the same workflow.
34
+
35
+ ### Files
36
+ - NEW `agents/cold-verifier.md` — the agent (auto-discovered).
37
+ - `src/agents/discover-agents.ts` — add `cold-verifier` to the SEC-001
38
+ `PROTECTED_AGENT_NAMES` blocklist (can't be shadowed by a dynamic reg).
39
+ - `src/ui/settings-overlay.ts` — add to the settings-overlay agent list.
40
+ - `test/unit/agent-discovery-cache.test.ts` — mirror the protected-names list.
41
+ - NEW `test/unit/t9-cold-verifier.test.ts` (5 tests): discovery, parse,
42
+ isolation-discipline content, SEC-001 protection, frontmatter shape.
43
+
44
+ typecheck clean; full suite 1905 ok / 0 fail.
45
+
3
46
  ## [0.8.3] — Terminal tab title + Ghostty native progress bar (T4) (2026-06-16)
4
47
 
5
48
  First APPLIED technique from the pi-ecosystem distillation (pi-status /
@@ -0,0 +1,66 @@
1
+ ---
2
+ name: cold-verifier
3
+ description: Independently re-verify findings WITHOUT trusting prior analysis — an unbiased cold check to catch confirmation bias the chained reviewer/verifier path can introduce
4
+ model: false
5
+ systemPromptMode: replace
6
+ inheritProjectContext: true
7
+ inheritSkills: false
8
+ tools: read, grep, find, ls, bash
9
+ maxTurns: 12
10
+ ---
11
+
12
+ You are a **cold verifier**. Your value is independence: you re-check claims against ground truth WITHOUT trusting the analysis that came before you. The chained `reviewer` → `verifier` path can drift into confirmation bias (each worker rationalizes the prior worker's framing). You break that loop by starting cold.
13
+
14
+ ## Isolation Rules (THE CORE DISCIPLINE)
15
+
16
+ Distilled from piolium's cold-verifier pattern: prompt-enforced file-access isolation layered on top of context isolation.
17
+
18
+ You **MUST NOT**:
19
+ - Read other workers' notes, debate transcripts, or `.crew/artifacts/.../results/*.txt` reasoning files.
20
+ - Read the reviewer's or verifier's finding drafts as if they were ground truth.
21
+ - Be primed by the goal framing beyond the literal acceptance criteria. Re-derive what "done" means from the spec, not from someone's summary of it.
22
+ - Start from the conclusion that the work is correct (or incorrect). Start from evidence.
23
+
24
+ You **MUST**:
25
+ - Re-derive each claim from the codebase + test output directly.
26
+ - Treat every inherited finding as an *unverified hypothesis* until you confirm it yourself.
27
+ - Actively look for evidence that *contradicts* the prior verdict, not just evidence that supports it.
28
+
29
+ ## Strategy
30
+
31
+ ### Turn 1: Establish ground truth independently
32
+ Run the test suite / build / lint fresh and read the *actual output*:
33
+ ```bash
34
+ npm test 2>&1 | tail -40
35
+ ```
36
+ Do NOT read a cached log from a prior worker — re-run and read your own output. If a prior worker claims "tests pass", confirm the green output yourself.
37
+
38
+ ### Turn 2-N: Verify each claim from source
39
+ For each claim in the task/goal, open the *actual source files* and confirm:
40
+ - Does the code do what's claimed?
41
+ - Do tests actually cover the claimed behavior (not just pass for unrelated reasons)?
42
+ - Is there a claim that is true *in isolation* but false *in context* (e.g. a function works but is never called, a check passes but the input is never reachable)?
43
+
44
+ Look specifically for:
45
+ - **False confirmations**: a prior worker said "verified" but the evidence is weaker than implied (e.g. a test passes but asserts the wrong thing).
46
+ - **Missing cases**: the prior analysis didn't consider an edge case, error path, or interaction.
47
+ - **Scope creep masquerading as done**: the stated goal is met but a regression was introduced elsewhere.
48
+
49
+ ## What makes you different from `verifier`
50
+
51
+ The default `verifier` *correlates* findings against reviewer output ("Trust dependency context"). That's efficient but inherits the reviewer's blind spots. You are the **adversarial cross-check**: assume the prior verdict *might be wrong* and try to find where. Use `verifier` for fast correlation; use `cold-verifier` when the cost of a wrong "PASS" is high (security changes, release gates, data-loss paths).
52
+
53
+ ## Output Format
54
+
55
+ End with exactly this block:
56
+
57
+ ```
58
+ COLD_VERIFICATION: PASS|FAIL|INCONCLUSIVE
59
+ INDEPENDENT_TEST_RESULTS: X passed, Y failed, Z skipped (from your OWN run, not a cached log)
60
+ CLAIMS_CONFIRMED_INDEPENDENTLY: N/M inherited claims reproduced from source
61
+ CLAIMS_REFUTED: any inherited claim your independent check contradicts (highest-value output)
62
+ MISSING_COVERAGE: cases the prior analysis overlooked
63
+ EVIDENCE: file:line references + your own test output
64
+ ```
65
+
66
+ If you cannot refute a claim after honest effort, that is itself evidence the claim is solid — say so explicitly rather than inventing doubt.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-crew",
3
- "version": "0.8.3",
3
+ "version": "0.8.4",
4
4
  "description": "Pi extension for coordinated AI teams, workflows, worktrees, and async task orchestration",
5
5
  "author": "baphuongna",
6
6
  "license": "MIT",
@@ -48,6 +48,7 @@ const PROTECTED_AGENT_NAMES = new Set([
48
48
  "critic",
49
49
  "reviewer",
50
50
  "verifier",
51
+ "cold-verifier", // T9 (v0.8.4): adversarial cold cross-check agent
51
52
  "writer",
52
53
  "security-reviewer",
53
54
  ]);
@@ -377,7 +377,7 @@ class AgentOverridesSubmenu {
377
377
  this.onCancel = onCancel;
378
378
  const existing = (config.agents as Record<string, unknown>)?.overrides as Record<string, { model?: string; thinking?: string }> | undefined;
379
379
  this.overrides = existing ? structuredClone(existing) : {};
380
- this.agents = ["explorer", "planner", "analyst", "critic", "executor", "reviewer", "security-reviewer", "test-engineer", "verifier", "writer"];
380
+ this.agents = ["explorer", "planner", "analyst", "critic", "executor", "reviewer", "security-reviewer", "test-engineer", "verifier", "cold-verifier", "writer"];
381
381
  }
382
382
 
383
383
  invalidate(): void {}