@cleocode/skills 2026.4.161 → 2026.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/package.json +1 -1
  2. package/skills/ct-council/SKILL.md +377 -0
  3. package/skills/ct-council/optimization/HARDENING-PLAYBOOK.md +107 -0
  4. package/skills/ct-council/optimization/README.md +74 -0
  5. package/skills/ct-council/optimization/scenarios.yaml +121 -0
  6. package/skills/ct-council/optimization/scripts/campaign.py +543 -0
  7. package/skills/ct-council/optimization/scripts/test_campaign.py +143 -0
  8. package/skills/ct-council/references/chairman.md +119 -0
  9. package/skills/ct-council/references/contrarian.md +70 -0
  10. package/skills/ct-council/references/evidence-pack.md +145 -0
  11. package/skills/ct-council/references/examples.md +235 -0
  12. package/skills/ct-council/references/executor.md +83 -0
  13. package/skills/ct-council/references/expansionist.md +68 -0
  14. package/skills/ct-council/references/first-principles.md +73 -0
  15. package/skills/ct-council/references/outsider.md +73 -0
  16. package/skills/ct-council/references/peer-review.md +125 -0
  17. package/skills/ct-council/scripts/analyze_runs.py +293 -0
  18. package/skills/ct-council/scripts/fixtures/executor_multi.md +198 -0
  19. package/skills/ct-council/scripts/fixtures/missing_advisor.md +117 -0
  20. package/skills/ct-council/scripts/fixtures/missing_convergence.md +190 -0
  21. package/skills/ct-council/scripts/fixtures/thin_evidence.md +193 -0
  22. package/skills/ct-council/scripts/fixtures/valid.md +226 -0
  23. package/skills/ct-council/scripts/fixtures/valid_with_llmtxt.md +226 -0
  24. package/skills/ct-council/scripts/llmtxt_ref.py +223 -0
  25. package/skills/ct-council/scripts/run_council.py +578 -0
  26. package/skills/ct-council/scripts/telemetry.py +624 -0
  27. package/skills/ct-council/scripts/test_telemetry.py +509 -0
  28. package/skills/ct-council/scripts/test_validate.py +452 -0
  29. package/skills/ct-council/scripts/validate.py +396 -0
@@ -0,0 +1,125 @@
1
+ # Shuffled Peer Review — Gate-Based Protocol
2
+
3
+ The peer review is where frames collide productively. An advisor reviewing another advisor's output does NOT play neutral judge — they evaluate from their own locked frame, which is exactly what makes the shuffle informative. The Contrarian reviewing First Principles means: "Zero-based analysis sounds clean, but here's the failure mode you introduced by stripping context."
4
+
5
+ This protocol replaces numeric scoring with **gate-based evaluation**. Each gate is pass/fail with required evidence. The reviewer must produce a quote or concrete citation to justify each gate decision. Theater ("4/5 — good") is structurally impossible.
6
+
7
+ ## The rotation (fixed, do not deviate)
8
+
9
+ ```
10
+ Contrarian → reviews → First Principles
11
+ First Principles → reviews → Expansionist
12
+ Expansionist → reviews → Outsider
13
+ Outsider → reviews → Executor
14
+ Executor → reviews → Contrarian
15
+ ```
16
+
17
+ **Properties:**
18
+ - No self-review.
19
+ - Every advisor reviews exactly once and is reviewed exactly once.
20
+ - Single 5-cycle, not pairs — information flows around the full ring.
21
+
22
+ **Why this specific rotation:**
23
+ - *Contrarian → First Principles*: stress-tests whether atomic truths survive adversarial conditions.
24
+ - *First Principles → Expansionist*: grounds ambitious upside against what's actually true.
25
+ - *Expansionist → Outsider*: checks whether the cold-read missed an opportunity hiding in plain sight.
26
+ - *Outsider → Executor*: the stranger asks "why *that* action?" — if it only makes sense with backstory, the Executor picked wrong.
27
+ - *Executor → Contrarian*: forces risk analysis to cash out. Pure doom with no actionable mitigation is cheap.
28
+
29
+ ## The four gates
30
+
31
+ Each gate is **strictly PASS or FAIL** — no middle states. The reviewer MUST provide the evidence the gate requires. A gate with no cited evidence is itself a validation failure.
32
+
33
+ **No PARTIAL / MIXED / CONDITIONAL / "PARTIAL PASS" / hedged values are allowed.** The validator rejects any gate line not matching `- G<N> <dimension>: PASS — <evidence>` or `- G<N> <dimension>: FAIL — <evidence>`.
34
+
35
+ If your judgment feels genuinely mixed — "the shape is right but the target is wrong", "it passes in spirit but not in letter", "mostly good except for one thing" — **pick FAIL** and express the nuance in the `Gap from <reviewer>'s frame` and `What I would add` fields. Those are exactly the fields the Chairman reads for texture. A FAIL with a rich gap note is more informative than a PARTIAL with thin justification, and it forces the reviewer to actually decide.
36
+
37
+ The test: would you act on this advisor's verdict as-written, unconditionally? If yes → PASS. If no, no matter why → FAIL, then explain the condition in the gap note.
38
+
39
+ ### G1 — Rigor gate
40
+
41
+ **PASS** if every finding in the reviewee's "Findings" list has a named subject, predicate, and (where the frame requires it) trigger condition. The reviewer MUST quote the strongest-rigor finding and, if any finding fails, the weakest.
42
+
43
+ **FAIL** if any finding is hedged ("might", "could", "may" without concrete anchor), vague ("there are scalability concerns"), or missing the frame's required specifics (Contrarian without trigger condition; Executor without expected outcome; First Principles without atoms; Expansionist without asymmetry; Outsider without artifact citation).
44
+
45
+ ### G2 — Evidence-grounding gate
46
+
47
+ **PASS** if every finding cites at least one item from the shared evidence pack, and every cited item actually exists in the pack. The reviewer MUST list all cited items.
48
+
49
+ **FAIL** if any finding is free-floating (no citation), cites an item not in the pack, or cites something that does not support the finding. The reviewer MUST list ungrounded or misgrounded findings.
50
+
51
+ ### G3 — Frame-integrity gate
52
+
53
+ **PASS** if no finding belongs to another advisor's lane. The reviewer MUST read the reviewee's persona file's "Your lane vs. other advisors' lanes" section and confirm.
54
+
55
+ **FAIL** if any finding is something a different advisor would produce. The reviewer MUST name which frame the violating finding belongs to and quote the violating line.
56
+
57
+ ### G4 — Actionability gate
58
+
59
+ **PASS** if the reviewee's verdict cashes out to a decision, a test, a change, or a concrete line of inquiry. The reviewer MUST quote the actionable part.
60
+
61
+ **FAIL** if the verdict is "interesting" but leaves the owner nowhere to go. "Further analysis is warranted" fails. "Reject the plan unless X is added" passes.
62
+
63
+ ## Peer review output template
64
+
65
+ **Destination:** when invoked as a Phase 2 subagent, save your full peer-review output below to `<run-dir>/peer-<reviewer-slug>-on-<reviewee-slug>.md` via the `Write` tool, then return only a one-line confirmation including the gate-pass count and disposition (e.g. `Wrote peer-contrarian-on-first-principles.md — 4/4 PASS, Disposition: Accept`). Do not include the full peer-review text in your reply — the orchestrator reads it back from the file.
66
+
67
+ The gate-line format is **load-bearing** — `scripts/validate.py` parses these lines with a regex anchored to the canonical names below. Use them VERBATIM.
68
+
69
+ | Gate | Canonical line prefix | Common mistakes (rejected) |
70
+ |---|---|---|
71
+ | G1 | `- G1 Rigor: PASS \| FAIL — ...` | `G1 Rigor gate:`, `G1: Rigor:`, `G1 - Rigor:` |
72
+ | G2 | `- G2 Evidence grounding: PASS \| FAIL — ...` | `G2 Evidence-grounding gate:`, `G2: Evidence:` |
73
+ | G3 | `- G3 Frame integrity: PASS \| FAIL — ...` | `G3 Frame-integrity gate:`, `G3: Frame:` |
74
+ | G4 | `- G4 Actionability: PASS \| FAIL — ...` | `G4 Actionability gate:`, `G4: Actionability:` |
75
+
76
+ The section headers below ("G1 — Rigor gate") use "gate" as a label *for the section*; the *gate verdict line* never does. The validator rejects gate verdict lines with the "gate" suffix because they break the canonical regex.
77
+
78
+ ```
79
+ ### <reviewer> reviewing <reviewee>
80
+
81
+ **Gate results:**
82
+ - G1 Rigor: PASS | FAIL — <quote of strongest finding; if FAIL, quote weakest and explain>
83
+ - G2 Evidence grounding: PASS | FAIL — <list cited items; if FAIL, list ungrounded/misgrounded findings>
84
+ - G3 Frame integrity: PASS | FAIL — <confirm lane; if FAIL, name the violating frame and quote the violating line>
85
+ - G4 Actionability: PASS | FAIL — <quote the actionable part; if FAIL, explain what's missing>
86
+
87
+ **Strongest finding (from reviewee):**
88
+ <quote or close paraphrase of the one finding the reviewer thinks lands hardest, even from an opposing frame>
89
+
90
+ **Gap from <reviewer>'s frame:**
91
+ <the specific thing the reviewee missed that the reviewer's frame would have caught. Concrete — no "could have gone deeper".>
92
+
93
+ **What I would add:**
94
+ <one sentence from the reviewer's frame that sharpens or corrects the reviewee's analysis. Single value-add.>
95
+
96
+ **Disposition:** Accept | Modify | Reject — <one sentence why>
97
+ ```
98
+
99
+ ## Hard rules
100
+
101
+ - Reviewer MUST stay in their own frame. A Contrarian reviewing First Principles still looks for what breaks.
102
+ - Reviewer MUST NOT produce a second copy of their own analysis. They evaluate the reviewee *through* their lens; they do not redo the work.
103
+ - Agreement with the reviewee is allowed if it adds a cross-frame dimension ("the Contrarian confirms the atomic truth holds under adversarial pressure"). Pure agreement with no added dimension is a Frame-integrity violation — the reviewer did not do their job.
104
+ - Every gate must have its required evidence. A naked "PASS" with no quote is itself a protocol violation caught by the validator.
105
+ - **Disposition** forces a call: Accept, Modify, or Reject. No fence-sitting.
106
+
107
+ ## Convergence check (Phase 2.5, before the Chairman)
108
+
109
+ After all five peer reviews complete, run the **convergence detector** before Phase 3:
110
+
111
+ 1. Extract the "Single sharpest point" from each of the 5 advisors.
112
+ 2. Pairwise-compare them. Are 3 or more semantically the same finding (same subject, same predicate)?
113
+ 3. If yes → **convergence flag**. The advisor(s) with the lowest gate-pass count are suspected of frame drift. Rerun those advisors with explicit frame-reinforcement (re-read persona file, emphasize the "Your lane vs. other advisors' lanes" section) before proceeding to Chairman.
114
+ 4. If no → proceed to Chairman.
115
+
116
+ **Why this exists:** in single-Claude mode, the same model produces all 5 advisor outputs in one response and they tend to rhyme. The convergence detector is the structural antibody.
117
+
118
+ **What "semantically the same" means:** if you can describe two findings with the same sentence and lose no essential content, they are convergent. "Retry storms are dangerous" and "the retry wrapper will cascade under load" are convergent. "Retry storms are dangerous" and "the plan omits idempotency classification" are not.
119
+
120
+ ## What the Chairman extracts from peer reviews
121
+
122
+ - **Gate-pass count per advisor** (0–4). Advisors with 4/4 pass carry full weight. Advisors with gate failures are weighted proportionally down.
123
+ - **Disposition distribution**: how many Accept / Modify / Reject across the 5 reviews. A review ring that's all Accept signals either genuinely strong work or insufficient friction (check G3 Frame-integrity results).
124
+ - **Cross-frame additions**: the "What I would add" sentences — these often contain the material that makes the final verdict sharper than any single advisor.
125
+ - **Convergence flag** (if raised): triggers a rerun; do not synthesize until resolved.
@@ -0,0 +1,293 @@
1
+ #!/usr/bin/env python3
2
+ """
3
+ analyze_runs.py — read council-runs.jsonl, surface where to harden next.
4
+
5
+ Reports:
6
+ * gate-failure hotspots (which advisor fails which gate most),
7
+ * peer-review reject frequency (per reviewer + per reviewee),
8
+ * convergence-flag rate,
9
+ * Chairman confidence distribution + low-confidence question shapes,
10
+ * token / wall-clock distribution per scope tier (if metrics present),
11
+ * exit-criteria scorecard from the plan.
12
+
13
+ Usage:
14
+ python3 analyze_runs.py # default log
15
+ python3 analyze_runs.py --log path/to/runs.jsonl
16
+ python3 analyze_runs.py --json
17
+ python3 analyze_runs.py --since 2026-04-24 # filter by timestamp prefix
18
+ python3 analyze_runs.py --tail 8 # last N runs only
19
+ """
20
+
21
+ from __future__ import annotations
22
+
23
+ import argparse
24
+ import json
25
+ import statistics
26
+ import sys
27
+ from collections import Counter, defaultdict
28
+ from pathlib import Path
29
+
30
+ DEFAULT_LOG_PATH = Path(".cleo/council-runs.jsonl")
31
+ ADVISORS = ["Contrarian", "First Principles", "Expansionist", "Outsider", "Executor"]
32
+ GATES = ["G1", "G2", "G3", "G4"]
33
+
34
+
35
+ def load_runs(path: Path, since: str | None = None, tail: int | None = None) -> list[dict]:
36
+ if not path.exists():
37
+ return []
38
+ runs: list[dict] = []
39
+ with path.open("r", encoding="utf-8") as f:
40
+ for line in f:
41
+ line = line.strip()
42
+ if not line:
43
+ continue
44
+ try:
45
+ rec = json.loads(line)
46
+ except json.JSONDecodeError:
47
+ continue
48
+ if since and rec.get("timestamp", "") < since:
49
+ continue
50
+ runs.append(rec)
51
+ if tail:
52
+ runs = runs[-tail:]
53
+ return runs
54
+
55
+
56
+ def gate_hotspots(runs: list[dict]) -> dict:
57
+ """Per (advisor, gate) FAIL count + rate."""
58
+ fail = Counter()
59
+ seen = Counter()
60
+ for r in runs:
61
+ for advisor, body in (r.get("advisors") or {}).items():
62
+ for gate in GATES:
63
+ verdict = (body.get("gates") or {}).get(gate)
64
+ if verdict in ("PASS", "FAIL"):
65
+ seen[(advisor, gate)] += 1
66
+ if verdict == "FAIL":
67
+ fail[(advisor, gate)] += 1
68
+ rows = []
69
+ for key, total in seen.items():
70
+ f = fail[key]
71
+ rows.append({
72
+ "advisor": key[0],
73
+ "gate": key[1],
74
+ "fail": f,
75
+ "n": total,
76
+ "fail_rate": round(f / total, 3) if total else 0.0,
77
+ })
78
+ rows.sort(key=lambda x: (-x["fail_rate"], -x["fail"], x["advisor"], x["gate"]))
79
+ return rows
80
+
81
+
82
+ def disposition_distribution(runs: list[dict]) -> dict:
83
+ by_reviewer = defaultdict(Counter)
84
+ by_reviewee = defaultdict(Counter)
85
+ overall = Counter()
86
+ for r in runs:
87
+ for pr in r.get("peer_reviews", []):
88
+ disp = pr.get("disposition") or "Unknown"
89
+ overall[disp] += 1
90
+ by_reviewer[pr["reviewer"]][disp] += 1
91
+ by_reviewee[pr["reviewee"]][disp] += 1
92
+ return {
93
+ "overall": dict(overall),
94
+ "by_reviewer": {k: dict(v) for k, v in by_reviewer.items()},
95
+ "by_reviewee": {k: dict(v) for k, v in by_reviewee.items()},
96
+ }
97
+
98
+
99
+ def convergence_rate(runs: list[dict]) -> dict:
100
+ raised = sum(1 for r in runs if (r.get("convergence") or {}).get("flag") is True)
101
+ cleared = sum(1 for r in runs if (r.get("convergence") or {}).get("flag") is False)
102
+ unknown = sum(1 for r in runs if (r.get("convergence") or {}).get("flag") is None)
103
+ return {
104
+ "raised": raised,
105
+ "cleared": cleared,
106
+ "unknown": unknown,
107
+ "rate": round(raised / len(runs), 3) if runs else 0.0,
108
+ }
109
+
110
+
111
+ def confidence_distribution(runs: list[dict]) -> dict:
112
+ counts = Counter()
113
+ low_conf_questions: list[str] = []
114
+ for r in runs:
115
+ conf = (r.get("chairman") or {}).get("confidence")
116
+ counts[conf or "missing"] += 1
117
+ if conf in ("low", "medium-low"):
118
+ low_conf_questions.append(r.get("question", ""))
119
+ return {
120
+ "counts": dict(counts),
121
+ "low_confidence_questions": low_conf_questions,
122
+ }
123
+
124
+
125
+ def cost_distribution(runs: list[dict]) -> dict:
126
+ tokens = [r.get("metrics", {}).get("tokens") for r in runs if (r.get("metrics") or {}).get("tokens")]
127
+ walls = [r.get("metrics", {}).get("wall_clock_seconds") for r in runs if (r.get("metrics") or {}).get("wall_clock_seconds")]
128
+
129
+ def _summary(xs):
130
+ if not xs:
131
+ return None
132
+ return {
133
+ "n": len(xs),
134
+ "min": min(xs),
135
+ "max": max(xs),
136
+ "mean": round(statistics.mean(xs), 1),
137
+ "stdev": round(statistics.stdev(xs), 1) if len(xs) > 1 else 0.0,
138
+ "spread_pct": round(((max(xs) - min(xs)) / statistics.mean(xs)) * 100, 1) if statistics.mean(xs) else 0.0,
139
+ }
140
+
141
+ return {"tokens": _summary(tokens), "wall_clock_seconds": _summary(walls)}
142
+
143
+
144
+ def exit_criteria(runs: list[dict]) -> dict:
145
+ """Scorecard against the plan's exit criteria."""
146
+ n = len(runs)
147
+
148
+ # 1. All shakedowns validate (here we don't know which run = which scenario,
149
+ # but we report the structural-validity rate as a proxy).
150
+ valid_runs = sum(1 for r in runs if (r.get("validation") or {}).get("valid"))
151
+
152
+ # 2. Every advisor ≥3/4 average gate pass.
153
+ sums = defaultdict(list)
154
+ for r in runs:
155
+ for advisor, body in (r.get("advisors") or {}).items():
156
+ sums[advisor].append(body.get("gate_pass_count", 0))
157
+ advisor_avg = {a: round(statistics.mean(v), 2) for a, v in sums.items() if v}
158
+
159
+ # 3. Convergence flag fires at most once across the campaign.
160
+ convergence_raised = convergence_rate(runs)["raised"]
161
+
162
+ # 4. Chairman confidence ≥ medium-high on ≥6/8 runs.
163
+ high_or_above = sum(
164
+ 1 for r in runs
165
+ if (r.get("chairman") or {}).get("confidence") in ("high", "medium-high")
166
+ )
167
+
168
+ # 5. Token cost stable within 20% per scope tier — proxy on overall spread.
169
+ tokens = [r.get("metrics", {}).get("tokens") for r in runs if (r.get("metrics") or {}).get("tokens")]
170
+ token_spread_ok = None
171
+ if tokens and len(tokens) > 1 and statistics.mean(tokens):
172
+ spread_pct = ((max(tokens) - min(tokens)) / statistics.mean(tokens)) * 100
173
+ token_spread_ok = spread_pct <= 20.0
174
+
175
+ return {
176
+ "n_runs": n,
177
+ "validate_pass_rate": round(valid_runs / n, 3) if n else 0.0,
178
+ "advisor_gate_avg": advisor_avg,
179
+ "advisor_gate_avg_min": min(advisor_avg.values()) if advisor_avg else None,
180
+ "convergence_raised": convergence_raised,
181
+ "high_or_above_confidence_runs": high_or_above,
182
+ "token_spread_within_20pct": token_spread_ok,
183
+ "checklist": {
184
+ "all_validate": valid_runs == n if n else False,
185
+ "every_advisor_avg_ge_3": all(v >= 3.0 for v in advisor_avg.values()) if advisor_avg else False,
186
+ "convergence_at_most_once": convergence_raised <= 1,
187
+ "high_or_above_ge_6_of_8": high_or_above >= 6 if n >= 8 else None,
188
+ "token_spread_ok": token_spread_ok,
189
+ },
190
+ }
191
+
192
+
193
+ def render_report(report: dict) -> str:
194
+ lines = []
195
+ lines.append(f"# Council telemetry — {report['n_runs']} run(s)")
196
+ lines.append("")
197
+
198
+ lines.append("## Exit-criteria scorecard")
199
+ cl = report["exit_criteria"]
200
+ lines.append(f"- Validate pass rate: {cl['validate_pass_rate']*100:.0f}%")
201
+ lines.append(f"- Advisor avg gate-pass (≥3.0 target): {cl['advisor_gate_avg']}")
202
+ lines.append(f"- Convergence flags raised: {cl['convergence_raised']} (target ≤1)")
203
+ lines.append(f"- High/medium-high confidence: {cl['high_or_above_confidence_runs']}/{report['n_runs']} (target ≥6/8)")
204
+ spread = cl["token_spread_within_20pct"]
205
+ lines.append(f"- Token spread within 20%: {'yes' if spread else 'no' if spread is False else 'n/a (insufficient runs with token metrics)'}")
206
+ lines.append("")
207
+
208
+ lines.append("## Gate-failure hotspots (top 5)")
209
+ if not report["gate_hotspots"]:
210
+ lines.append("- No gate-fail data yet.")
211
+ else:
212
+ for row in report["gate_hotspots"][:5]:
213
+ if row["fail"] == 0:
214
+ continue
215
+ lines.append(
216
+ f"- {row['advisor']:<16} {row['gate']} "
217
+ f"fail {row['fail']}/{row['n']} ({row['fail_rate']*100:.0f}%)"
218
+ )
219
+ if all(r["fail"] == 0 for r in report["gate_hotspots"]):
220
+ lines.append("- 0 gate failures across all runs (suspicious — check whether reviewers are too lenient).")
221
+ lines.append("")
222
+
223
+ lines.append("## Peer-review disposition distribution")
224
+ disp = report["dispositions"]
225
+ lines.append(f"- Overall: {disp['overall']}")
226
+ lines.append("")
227
+
228
+ lines.append("## Convergence")
229
+ cv = report["convergence"]
230
+ lines.append(f"- Raised: {cv['raised']} | Cleared: {cv['cleared']} | Unknown: {cv['unknown']} (rate {cv['rate']*100:.0f}%)")
231
+ lines.append("")
232
+
233
+ lines.append("## Chairman confidence")
234
+ conf = report["confidence"]
235
+ lines.append(f"- Distribution: {conf['counts']}")
236
+ if conf["low_confidence_questions"]:
237
+ lines.append("- Low-confidence questions (candidates for documenting as 'not a good council fit'):")
238
+ for q in conf["low_confidence_questions"]:
239
+ lines.append(f" - {q}")
240
+ lines.append("")
241
+
242
+ lines.append("## Cost (token + wall-clock summary)")
243
+ cost = report["cost"]
244
+ if cost["tokens"]:
245
+ t = cost["tokens"]
246
+ lines.append(f"- Tokens: n={t['n']} mean={t['mean']:.0f} stdev={t['stdev']:.0f} spread={t['spread_pct']}%")
247
+ else:
248
+ lines.append("- Tokens: no metrics recorded (pass --tokens to telemetry.py).")
249
+ if cost["wall_clock_seconds"]:
250
+ w = cost["wall_clock_seconds"]
251
+ lines.append(f"- Wall-clock: n={w['n']} mean={w['mean']}s stdev={w['stdev']}s")
252
+ else:
253
+ lines.append("- Wall-clock: no metrics recorded.")
254
+ lines.append("")
255
+
256
+ return "\n".join(lines)
257
+
258
+
259
+ def build_report(runs: list[dict]) -> dict:
260
+ return {
261
+ "n_runs": len(runs),
262
+ "gate_hotspots": gate_hotspots(runs),
263
+ "dispositions": disposition_distribution(runs),
264
+ "convergence": convergence_rate(runs),
265
+ "confidence": confidence_distribution(runs),
266
+ "cost": cost_distribution(runs),
267
+ "exit_criteria": exit_criteria(runs),
268
+ }
269
+
270
+
271
+ def main():
272
+ parser = argparse.ArgumentParser(description="Analyze council-runs.jsonl telemetry.")
273
+ parser.add_argument("--log", default=str(DEFAULT_LOG_PATH), help=f"JSONL log path (default: {DEFAULT_LOG_PATH}).")
274
+ parser.add_argument("--json", action="store_true", help="Emit JSON report.")
275
+ parser.add_argument("--since", default=None, help="Only include runs with ISO timestamps ≥ this prefix.")
276
+ parser.add_argument("--tail", type=int, default=None, help="Only the last N runs.")
277
+ args = parser.parse_args()
278
+
279
+ runs = load_runs(Path(args.log), since=args.since, tail=args.tail)
280
+ report = build_report(runs)
281
+
282
+ if not runs:
283
+ print(f"⚠️ No runs found at {args.log}.", file=sys.stderr)
284
+ sys.exit(0)
285
+
286
+ if args.json:
287
+ print(json.dumps(report, indent=2, default=str))
288
+ else:
289
+ print(render_report(report))
290
+
291
+
292
+ if __name__ == "__main__":
293
+ main()
@@ -0,0 +1,198 @@
1
+ # The Council — Should we add a retry-on-timeout wrapper to outbound HTTP calls?
2
+
3
+ ## Evidence pack
4
+
5
+ 1. `packages/core/src/http.ts:L12-L58` — current httpGet/httpPost.
6
+ 2. `packages/core/src/circuit-breaker.ts` — exists with zero callers.
7
+ 3. commit `a1b2c3d "drop retries from http client"` — retries removed 18 months ago.
8
+
9
+ ## Phase 1 — Advisor analyses
10
+
11
+ ### Advisor: Contrarian
12
+
13
+ **Frame:** Assume the plan is wrong.
14
+
15
+ **Evidence anchored:**
16
+ - commit `a1b2c3d` — retries were pulled for a documented reason.
17
+ - `packages/core/src/http.ts` — zero per-caller rate limits.
18
+
19
+ **Verdict from this lens:** Plan re-introduces known incident class.
20
+
21
+ **Single sharpest point:** Retry wrapper without breaker reproduces old bug.
22
+
23
+ ### Advisor: First Principles
24
+
25
+ **Frame:** Ignore everything.
26
+
27
+ **Evidence anchored:**
28
+ - RFC 7231.
29
+ - `packages/core/src/http.ts:L12-L58`.
30
+
31
+ **Verdict from this lens:** Plan incomplete.
32
+
33
+ **Single sharpest point:** Non-idempotent requests cannot be blindly retried.
34
+
35
+ ### Advisor: Expansionist
36
+
37
+ **Frame:** Forget the constraints.
38
+
39
+ **Evidence anchored:**
40
+ - `packages/core/src/circuit-breaker.ts`.
41
+ - `MEMORY.md`.
42
+
43
+ **Verdict from this lens:** Owner thinking too small.
44
+
45
+ **Single sharpest point:** Wire the circuit breaker.
46
+
47
+ ### Advisor: Outsider
48
+
49
+ **Frame:** You have no context.
50
+
51
+ **Evidence anchored:**
52
+ - `packages/core/src/circuit-breaker.ts` — zero callers.
53
+ - `docs/adr/ADR-021-http-client.md`.
54
+
55
+ **What the artifact claims vs. shows:** Claims await breaker; shows breaker has landed.
56
+
57
+ **Verdict from this lens:** Project prepared but didn't close the loop.
58
+
59
+ **Single sharpest point:** ADR says do this when breaker lands; it has landed.
60
+
61
+ ### Advisor: Executor
62
+
63
+ **Frame:** Don't analyze.
64
+
65
+ **Evidence anchored:**
66
+ - `packages/core/test/http.test.ts`.
67
+ - `packages/core/src/circuit-breaker.ts`.
68
+
69
+ **The action (one):**
70
+ 1. Write a failing test.
71
+ 2. Implement the retry wrapper.
72
+ 3. Add circuit breaker wiring.
73
+
74
+ **Expected outcome (60 minutes from now):**
75
+ Many things happen.
76
+
77
+ **What this unblocks:**
78
+ All subsequent work.
79
+
80
+ **Verdict from this lens:** Lots to do.
81
+
82
+ **Single sharpest point:** Do three things simultaneously.
83
+
84
+ ## Phase 2 — Shuffled peer reviews
85
+
86
+ ### Contrarian reviewing First Principles
87
+
88
+ **Gate results:**
89
+ - G1 Rigor: PASS — specific.
90
+ - G2 Evidence grounding: PASS — cited.
91
+ - G3 Frame integrity: PASS — in lane.
92
+ - G4 Actionability: PASS — decidable.
93
+
94
+ **Strongest finding (from reviewee):** Idempotency.
95
+
96
+ **Gap from Contrarian's frame:** None.
97
+
98
+ **What I would add:** Nothing.
99
+
100
+ **Disposition:** Accept — holds.
101
+
102
+ ### First Principles reviewing Expansionist
103
+
104
+ **Gate results:**
105
+ - G1 Rigor: PASS — specific.
106
+ - G2 Evidence grounding: PASS — cited.
107
+ - G3 Frame integrity: PASS — in lane.
108
+ - G4 Actionability: PASS — decidable.
109
+
110
+ **Strongest finding (from reviewee):** Asset wiring.
111
+
112
+ **Gap from First Principles' frame:** None.
113
+
114
+ **What I would add:** Nothing.
115
+
116
+ **Disposition:** Accept — holds.
117
+
118
+ ### Expansionist reviewing Outsider
119
+
120
+ **Gate results:**
121
+ - G1 Rigor: PASS — specific.
122
+ - G2 Evidence grounding: PASS — cited.
123
+ - G3 Frame integrity: PASS — in lane.
124
+ - G4 Actionability: PASS — decidable.
125
+
126
+ **Strongest finding (from reviewee):** ADR gap.
127
+
128
+ **Gap from Expansionist's frame:** None.
129
+
130
+ **What I would add:** Nothing.
131
+
132
+ **Disposition:** Accept — holds.
133
+
134
+ ### Outsider reviewing Executor
135
+
136
+ **Gate results:**
137
+ - G1 Rigor: FAIL — three actions listed, not one.
138
+ - G2 Evidence grounding: PASS — cited.
139
+ - G3 Frame integrity: FAIL — multiple actions violates Executor frame.
140
+ - G4 Actionability: FAIL — action is ambiguous.
141
+
142
+ **Strongest finding (from reviewee):** Writing the test is still valid.
143
+
144
+ **Gap from Outsider's frame:** Executor frame requires exactly one action.
145
+
146
+ **What I would add:** Nothing.
147
+
148
+ **Disposition:** Reject — frame violation.
149
+
150
+ ### Executor reviewing Contrarian
151
+
152
+ **Gate results:**
153
+ - G1 Rigor: PASS — specific.
154
+ - G2 Evidence grounding: PASS — cited.
155
+ - G3 Frame integrity: PASS — in lane.
156
+ - G4 Actionability: PASS — decidable.
157
+
158
+ **Strongest finding (from reviewee):** Retry storm.
159
+
160
+ **Gap from Executor's frame:** No mitigation named.
161
+
162
+ **What I would add:** Wire breaker first.
163
+
164
+ **Disposition:** Accept — risk real.
165
+
166
+ ## Phase 2.5 — Convergence check
167
+
168
+ No convergence.
169
+
170
+ ## Phase 3 — Chairman's verdict
171
+
172
+ ### Gate summary
173
+
174
+ | Advisor | G1 | G2 | G3 | G4 | Weight |
175
+ |---|---|---|---|---|---|
176
+ | Contrarian | PASS | PASS | PASS | PASS | full |
177
+ | First Principles | PASS | PASS | PASS | PASS | full |
178
+ | Expansionist | PASS | PASS | PASS | PASS | full |
179
+ | Outsider | PASS | PASS | PASS | PASS | full |
180
+ | Executor | FAIL | PASS | FAIL | FAIL | low |
181
+
182
+ ### Recommendation
183
+ Rerun Executor.
184
+
185
+ ### Why this, not the alternatives
186
+ Executor violated frame.
187
+
188
+ ### What each advisor got right
189
+ See above.
190
+
191
+ ### Conditions on the recommendation
192
+ Rerun required.
193
+
194
+ ### Next 60-minute action
195
+ Rerun the Executor pass with explicit one-action constraint.
196
+
197
+ ### Confidence
198
+ Medium — four frames solid, one rerun needed.