npm - @vextlabs/theron-cli - Versions diffs - 0.2.1 → 0.4.0 - Mend

@vextlabs/theron-cli 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (191) hide show

package/dist/api.d.ts +8 -0
package/dist/api.js +3 -0
package/dist/api.js.map +1 -1
package/dist/auth.js +51 -1
package/dist/auth.js.map +1 -1
package/dist/banner.js +3 -2
package/dist/banner.js.map +1 -1
package/dist/checkpoints.d.ts +32 -0
package/dist/checkpoints.js +61 -0
package/dist/checkpoints.js.map +1 -0
package/dist/index.js +61 -5
package/dist/index.js.map +1 -1
package/dist/input.d.ts +61 -0
package/dist/input.js +574 -0
package/dist/input.js.map +1 -0
package/dist/profiles/index.js +5 -0
package/dist/profiles/index.js.map +1 -1
package/dist/profiles/methodologies/build_domains.d.ts +6 -0
package/dist/profiles/methodologies/build_domains.js +170 -0
package/dist/profiles/methodologies/build_domains.js.map +1 -0
package/dist/profiles/methodologies/operate_domains.d.ts +8 -0
package/dist/profiles/methodologies/operate_domains.js +1239 -0
package/dist/profiles/methodologies/operate_domains.js.map +1 -0
package/dist/profiles/methodologies/regulated_domains.d.ts +6 -0
package/dist/profiles/methodologies/regulated_domains.js +153 -0
package/dist/profiles/methodologies/regulated_domains.js.map +1 -0
package/dist/profiles/methodologies/research_domains.d.ts +8 -0
package/dist/profiles/methodologies/research_domains.js +179 -0
package/dist/profiles/methodologies/research_domains.js.map +1 -0
package/dist/profiles/methodologies/strategy_domains.d.ts +15 -0
package/dist/profiles/methodologies/strategy_domains.js +193 -0
package/dist/profiles/methodologies/strategy_domains.js.map +1 -0
package/dist/profiles/seeds.js +241 -95
package/dist/profiles/seeds.js.map +1 -1
package/dist/receipt.d.ts +17 -0
package/dist/receipt.js +46 -0
package/dist/receipt.js.map +1 -0
package/dist/render.d.ts +4 -1
package/dist/render.js +95 -28
package/dist/render.js.map +1 -1
package/dist/repl.d.ts +8 -1
package/dist/repl.js +420 -62
package/dist/repl.js.map +1 -1
package/dist/sessions.d.ts +14 -0
package/dist/sessions.js +100 -0
package/dist/sessions.js.map +1 -1
package/dist/ship.d.ts +2 -0
package/dist/ship.js +62 -0
package/dist/ship.js.map +1 -0
package/dist/skills/catalog.d.ts +13 -0
package/dist/skills/catalog.js +86 -0
package/dist/skills/catalog.js.map +1 -0
package/dist/tools/bash.js +81 -14
package/dist/tools/bash.js.map +1 -1
package/dist/tools/edit.js +21 -1
package/dist/tools/edit.js.map +1 -1
package/dist/tools/glob.js +4 -1
package/dist/tools/glob.js.map +1 -1
package/dist/tools/grep.d.ts +5 -0
package/dist/tools/grep.js +101 -2
package/dist/tools/grep.js.map +1 -1
package/dist/tools/index.d.ts +22 -0
package/dist/tools/index.js +177 -41
package/dist/tools/index.js.map +1 -1
package/dist/tools/ls.d.ts +3 -0
package/dist/tools/ls.js +23 -12
package/dist/tools/ls.js.map +1 -1
package/dist/tools/multiedit.d.ts +12 -0
package/dist/tools/multiedit.js +79 -0
package/dist/tools/multiedit.js.map +1 -0
package/dist/tools/stoa.d.ts +1 -1
package/dist/tools/stoa.js +7 -3
package/dist/tools/stoa.js.map +1 -1
package/dist/tools/task.d.ts +9 -0
package/dist/tools/task.js +166 -0
package/dist/tools/task.js.map +1 -0
package/dist/tools/todowrite.d.ts +12 -0
package/dist/tools/todowrite.js +38 -0
package/dist/tools/todowrite.js.map +1 -0
package/dist/tools/webfetch.d.ts +6 -0
package/dist/tools/webfetch.js +98 -0
package/dist/tools/webfetch.js.map +1 -0
package/dist/tools/websearch.d.ts +7 -0
package/dist/tools/websearch.js +83 -0
package/dist/tools/websearch.js.map +1 -0
package/dist/tools/write.js +17 -1
package/dist/tools/write.js.map +1 -1
package/dist/verifiers/calc_gate.d.ts +2 -0
package/dist/verifiers/calc_gate.js +112 -0
package/dist/verifiers/calc_gate.js.map +1 -0
package/dist/verifiers/citation_gate.d.ts +2 -0
package/dist/verifiers/citation_gate.js +130 -0
package/dist/verifiers/citation_gate.js.map +1 -0
package/dist/verifiers/confidence_marked.d.ts +2 -0
package/dist/verifiers/confidence_marked.js +49 -0
package/dist/verifiers/confidence_marked.js.map +1 -0
package/dist/verifiers/disclaimer_gate.d.ts +2 -0
package/dist/verifiers/disclaimer_gate.js +57 -0
package/dist/verifiers/disclaimer_gate.js.map +1 -0
package/dist/verifiers/evidence_gate.d.ts +2 -0
package/dist/verifiers/evidence_gate.js +108 -0
package/dist/verifiers/evidence_gate.js.map +1 -0
package/dist/verifiers/index.d.ts +5 -0
package/dist/verifiers/index.js +28 -7
package/dist/verifiers/index.js.map +1 -1
package/dist/verifiers/lint.js +4 -3
package/dist/verifiers/lint.js.map +1 -1
package/dist/verifiers/promoted_kernels.d.ts +8 -0
package/dist/verifiers/promoted_kernels.js +190 -0
package/dist/verifiers/promoted_kernels.js.map +1 -0
package/dist/verifiers/source_gate.d.ts +2 -0
package/dist/verifiers/source_gate.js +125 -0
package/dist/verifiers/source_gate.js.map +1 -0
package/dist/verifiers/test_smoke.js +30 -0
package/dist/verifiers/test_smoke.js.map +1 -1
package/dist/verifiers/types.d.ts +3 -0
package/package.json +4 -2
package/skills/README.md +123 -0
package/skills/ab-test.md +89 -0
package/skills/api-design.md +175 -0
package/skills/architecture-design.md +185 -0
package/skills/business-case.md +77 -0
package/skills/causal-inference.md +77 -0
package/skills/clinical-guideline.md +98 -0
package/skills/code-review.md +98 -0
package/skills/cold-outreach.md +268 -0
package/skills/competitive-teardown.md +223 -0
package/skills/component-spec.md +121 -0
package/skills/content-calendar.md +280 -0
package/skills/contract-review.md +155 -0
package/skills/data-analysis.md +187 -0
package/skills/debug.md +91 -0
package/skills/design-audit.md +121 -0
package/skills/differential-diagnosis.md +79 -0
package/skills/discovery-call.md +206 -0
package/skills/edit-pass.md +80 -0
package/skills/engineering-calc.md +101 -0
package/skills/estimate.md +70 -0
package/skills/experiment-design.md +105 -0
package/skills/fact-check.md +82 -0
package/skills/financial-model.md +104 -0
package/skills/grant-proposal.md +93 -0
package/skills/harmony-analysis.md +93 -0
package/skills/hypothesis-generation.md +99 -0
package/skills/incident-response.md +134 -0
package/skills/interview-loop.md +62 -0
package/skills/job-scorecard.md +92 -0
package/skills/kb-article.md +174 -0
package/skills/launch-plan.md +85 -0
package/skills/lease-review.md +93 -0
package/skills/lesson-plan.md +198 -0
package/skills/literature-review.md +69 -0
package/skills/market-entry.md +137 -0
package/skills/market-sizing.md +159 -0
package/skills/meta-analysis.md +140 -0
package/skills/migrate.md +117 -0
package/skills/optimize.md +88 -0
package/skills/options-strategy.md +166 -0
package/skills/peer-review.md +96 -0
package/skills/pentest-plan.md +193 -0
package/skills/pitch-review.md +132 -0
package/skills/plan.md +88 -0
package/skills/policy-brief.md +124 -0
package/skills/positioning.md +192 -0
package/skills/postmortem.md +168 -0
package/skills/prd.md +105 -0
package/skills/prioritize.md +162 -0
package/skills/proof.md +91 -0
package/skills/property-underwrite.md +159 -0
package/skills/recipe-develop.md +109 -0
package/skills/red-team.md +142 -0
package/skills/refactor.md +58 -0
package/skills/reflection-session.md +115 -0
package/skills/regulatory-compliance.md +136 -0
package/skills/reproduce.md +87 -0
package/skills/runbook.md +344 -0
package/skills/security-audit.md +154 -0
package/skills/seo-brief.md +201 -0
package/skills/sql-query.md +161 -0
package/skills/story-craft.md +163 -0
package/skills/tdd.md +59 -0
package/skills/term-sheet.md +298 -0
package/skills/theory-of-change.md +88 -0
package/skills/threat-model.md +104 -0
package/skills/ticket-triage.md +200 -0
package/skills/tolerance-analysis.md +149 -0
package/skills/training-program.md +151 -0
package/skills/translate.md +64 -0
package/skills/unit-economics.md +238 -0
package/skills/valuation.md +112 -0
package/skills/write-tests.md +77 -0

package/skills/plan.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+name: plan
+description: Decompose a complex task into a grounded, verifiable plan — investigate current state first, small steps with acceptance checks, riskiest unknowns first, surface before irreversible work.
+allowed-tools: Read, Grep, Glob, TodoWrite, WebSearch
+---
+## PHASE 0 — CLARIFY BEFORE YOU PLAN (no code yet)
+1. Restate the goal in one sentence. If you cannot, ask the user. Do not proceed until the goal is unambiguous.
+2. Define DONE: what observable state proves this task is complete? (passing test, deployed URL, file exists, command exits 0, metric reading, etc.)
+3. Extract hard constraints: language/framework version, no-touch files, latency budget, cost ceiling, must-not-break surfaces.
+4. List explicit NON-GOALS: what is out of scope? Write them down — omitted non-goals become silent scope creep.
+5. Check: does the user want a PLAN ONLY (surface + wait for approval) or PLAN + EXECUTE? Confirm before any write.
+## PHASE 1 — INVESTIGATE CURRENT STATE (ground the plan in reality)
+6. Before writing a single TODO, read the relevant code. Use Grep/Glob to locate entry points, config files, test harnesses, and CI definitions that this task will touch.
+7. Run: `git status`, `git log --oneline -10`, check open TODOs or FIXMEs in scope. Know what is already in flight.
+8. Identify the exact files, functions, and line ranges that must change. Write them down with absolute paths.
+9. Check for existing tests: what test files cover this area? Run them baseline-green before touching anything.
+10. List every external dependency (API, env var, secret, DB schema, npm package). Confirm each exists and is accessible NOW, not assumed.
+11. Search for prior art in the repo: has this pattern been solved elsewhere? (`grep -r` the key concept.) Reuse before inventing.
+## PHASE 2 — DECOMPOSE INTO SMALL, INDEPENDENTLY-VERIFIABLE STEPS
+12. Break the work into steps where each step produces a single, checkable artifact: a file changed, a test added, a command that passes, an endpoint that returns 200.
+13. Maximum step size: one logical change that can be reviewed in < 20 minutes. If a step takes longer, split it.
+14. Write an ACCEPTANCE CHECK for every step — the exact command or observation that proves it worked. No vague "verify it works."
+    - Code change → `tsc --noEmit` / `npm test` / `curl` response / `grep` for output in file
+    - Migration → `psql` query returns expected rows
+    - Deploy → health endpoint returns 200, log line appears
+15. Mark each step as REVERSIBLE (can undo with `git checkout`, rollback migration, re-deploy previous) or IRREVERSIBLE (sends email, charges card, deletes data, publishes npm, pushes to prod, drops table). Flag irreversible steps in ALL CAPS.
+## PHASE 3 — SEQUENCE BY DEPENDENCY AND RISK
+16. Build a dependency graph (even mental): which steps must precede others? Make it explicit in the TODO list (e.g., "step 4 requires step 2 done first").
+17. Move the RISKIEST / MOST UNCERTAIN step to the front as a SPIKE. A spike is a time-boxed investigation (max 30 min) that answers one unknown before you commit to the full plan. If the spike fails, the plan changes cheaply.
+18. Common risk signals to front-load: external API behavior you have not verified, DB schema changes, new npm packages with unknown transitive deps, any step marked IRREVERSIBLE.
+19. Critical path: identify the sequence of steps that determines total elapsed time. Mark these CP. Parallelize non-CP steps where safe.
+20. Default sequence template:
+    a. Spike (prove the riskiest unknown)
+    b. Scaffolding (add files, types, empty functions — no logic yet; keeps CI green)
+    c. Implementation (fill in logic per file)
+    d. Unit tests (cover the new logic)
+    e. Integration / smoke test (end-to-end happy path)
+    f. IRREVERSIBLE steps (deploy, publish, migrate prod) — only after all tests pass
+## PHASE 4 — SURFACE UNKNOWNS AND ASSUMPTIONS EXPLICITLY
+21. List every assumption the plan makes. For each: "IF this assumption is wrong, THEN the plan breaks here, and we do X instead."
+22. List every unknown. For each: "We will resolve this by [reading file X / running command Y / asking user / time-boxed spike]."
+23. Do NOT paper over unknowns with "we'll figure it out." Unresolved unknowns are risks; name them.
+## PHASE 5 — WRITE THE LIVING TODO LIST
+24. Translate the plan into a TodoWrite call. Every item must have:
+    - A concrete title (verb + noun + file or command, not "handle the auth stuff")
+    - Status: pending | in_progress | done
+    - Priority: p1 (blocks others) | p2 (on critical path) | p3 (parallel / nice-to-have)
+25. Keep the TODO list the single source of truth. Update it as reality changes — completed steps marked done, new discoveries added immediately, invalidated steps removed with a note why.
+26. Never have more than one item in_progress at a time. Finish and verify before starting the next.
+## PHASE 6 — SURFACE PLAN FOR APPROVAL BEFORE IRREVERSIBLE WORK
+27. Before executing ANY irreversible step, output the full plan summary to the user:
+    - Goal + definition of done
+    - Steps completed so far (with acceptance check results)
+    - The next irreversible step and exactly what it will do
+    - What cannot be undone after this point
+28. WAIT for explicit user approval ("yes", "go ahead", "ship it") before proceeding. "OK" or silence is NOT approval for irreversible steps.
+29. If the plan has changed materially since initial agreement, re-surface the updated plan before continuing.
+## PHASE 7 — EXECUTE WITH CONTINUOUS VERIFICATION
+30. Execute one step at a time. Run the acceptance check immediately after each step. If the check fails, STOP — do not proceed to the next step.
+31. When a step fails: diagnose in the same scope (don't thrash into adjacent files), fix, re-run acceptance check, then continue.
+32. After every 3-5 steps, do a brief state check: `git diff --stat`, confirm no unintended files changed, run the full test suite if fast (< 2 min).
+33. If reality diverges from the plan (a file doesn't exist, an API behaves differently, a test was already covering more than expected), update the TODO list and note the delta explicitly — do not silently adjust and continue.
+## HARD RULES (non-negotiable)
+- NEVER write production code before Phase 1 (investigation) is complete.
+- NEVER skip the acceptance check for a step and mark it done.
+- NEVER execute an IRREVERSIBLE step without Phase 6 approval.
+- NEVER assume a baseline is green — run the tests first and confirm.
+- NEVER plan in the abstract when you can read the actual file. Grep before guessing.
+- If a step takes > 2x estimated effort, STOP, re-plan, surface updated plan before continuing.
+- Plans that reference "etc.", "and so on", or "similar" are incomplete — expand every item to be concrete.

package/skills/policy-brief.md ADDED Viewed

@@ -0,0 +1,124 @@
+---
+name: policy-brief
+description: Draft a rigorous 2-page policy brief on a named policy question — problem framing with quantified status-quo baseline, do-nothing and active options scored on effectiveness/cost/equity/feasibility, WINNERS and LOSERS named explicitly, evidence tier labeled for every quantitative claim, statutory citations verified for every legal reference, and a single prioritized recommendation with a named rebuttal — use when asked to brief a policymaker, analyze a bill, evaluate a regulation, or compare policy options on any public-sector question.
+allowed-tools: Read, WebSearch, WebFetch, Write
+---
+═══ HARD RULES ═══
+1. NEVER assert a quantitative claim (statistic, cost figure, elasticity, demographic share) without labeling its evidence tier: [T1] peer-reviewed / official data, [T2] grey literature / government report, [T3] credible journalism / think-tank, [T4] analyst estimate — and flagging T3/T4 as lower-confidence.
+2. NEVER cite a statute, section, or regulation from memory alone; WebFetch the current authoritative text (GPO, eCFR, official legislature site) or explicitly mark the citation "VERIFY BEFORE USE — retrieved from memory."
+3. NEVER omit the do-nothing option; it is Option 0 and must receive the same scored analysis as every active option, including projected baseline cost/outcome.
+4. NEVER conflate effectiveness (does the policy actually change the outcome?) with feasibility (can it pass and be implemented?); score them on separate axes in Phase C.
+5. NEVER suppress distributional effects; every option must name at least one WINNER and one LOSER with the specific causal mechanism — not "some groups may be affected."
+6. NEVER present a single-point cost estimate where the literature offers a range; use the range and cite both ends.
+7. NEVER recommend without stating the single strongest counterargument and responding to it directly and specifically — no dismissals.
+8. NEVER use passive voice to obscure agency: write "employers in sector X will bear an estimated $Y per FTE [T2]" not "costs will rise."
+9. ALWAYS distinguish intended effect (the policy's stated goal) from empirical evidence of effect (what peer-reviewed studies or program evaluations show actually happened in analogous deployments).
+10. ALWAYS flag when a claim depends on a contested empirical assumption (e.g., labor-demand elasticity, price pass-through rate, behavioral response rate) — label it [CONTESTED] and cite the range of expert estimates.
+11. NEVER silently drop a policy option; if an option is eliminated by a hard constraint (constitutional, statutory, preemption), document the elimination reason explicitly in Phase B.
+12. IF WebSearch returns no substantive results for a specific domain, do not fabricate evidence — label those sections [EVIDENCE LIMITED — fewer than three independent studies found] and continue with the analysis, flagging confidence accordingly.
+═══ PHASE A — PROBLEM DEFINITION AND STATUS-QUO BASELINE ═══
+A1. State the policy question in one sentence using the form: "Should [governing body] [action] in order to [intended outcome], and if so, how?" Pin the specific decision-maker, jurisdiction, and the relevant legislative or regulatory vehicle if one has been named.
+A2. Quantify the status-quo baseline — the problem as it exists TODAY without any policy change:
+    - Scale: affected population size, geographic scope, economic magnitude ($ or %) [evidence tier].
+    - Trend: is the problem growing, stable, or self-correcting? Cite a trend data point (year-over-year or cohort comparison) [evidence tier].
+    - Cost of inaction: express the status-quo trajectory in the same units as the proposed intervention's goals, so Option 0 is directly comparable to active options.
+A3. Identify the causal mechanism the problem operates through. Use the policy-relevant taxonomy: market failure (externality / information asymmetry / market power / public-good under-provision / coordination problem), regulatory gap, access barrier, or enforcement failure. Labeling the mechanism determines which policy instruments are theoretically justified — a subsidy does not fix a market-power problem; a disclosure mandate does not fix a coordination failure.
+A4. Cite the primary statute, regulation, or constitutional provision that governs this domain. Format: [Short Title, § Section Number, Year enacted or last amended]. For proposed legislation: bill number, sponsor, committee referral, and current procedural posture (introduced / reported / floor / conference). Retrieve the current text via WebFetch before citing.
+A5. Note any binding constraints that bound the option space before analysis: constitutional limits, treaty obligations, preemption doctrine, Chevron/Loper Bright deference status, appropriations ceilings, or sunset provisions. These are hard filters — they eliminate options regardless of policy merit and must be applied in Phase B.
+═══ PHASE B — OPTIONS GENERATION AND SCOPING ═══
+B1. Generate a minimum of three options using these slots (a fourth is optional if the question warrants an alternative instrument):
+    - Option 0 — Do Nothing: explicit description of the status-quo trajectory over the relevant time horizon, including the projected baseline cost/outcome if the problem continues unaddressed.
+    - Option A — Minimum Intervention: the least disruptive active policy that directly addresses the causal mechanism identified in A3.
+    - Option B — Leading Active Option: the most commonly proposed or analytically strongest intervention, drawn from the current legislative debate or peer-reviewed literature.
+    - Option C (if warranted) — Alternative Instrument: an option using a categorically different policy lever (e.g., if A and B are mandates, C is a subsidy or liability rule or information-disclosure regime).
+B2. For each option, state in one sentence: the policy instrument (mandate / tax / subsidy / information disclosure / procurement standard / liability rule / public provision), the target actor (who must change behavior), and the enforcement mechanism (agency, penalty structure, private right of action).
+B3. Apply the hard constraints from A5. Eliminate any option that fails a constraint and state the elimination reason explicitly. Do not silently drop options.
+═══ PHASE C — MULTI-CRITERIA SCORING ═══
+C1. Score each surviving option on four criteria using H (High) / M (Medium) / L (Low) with a one-sentence justification and evidence tier for each score:
+    EFFECTIVENESS — Does evidence from analogous policies show this instrument actually moves the target outcome, and by how much?
+    - Cite at least one empirical study, natural experiment, or program evaluation by author/agency and year. If no analogous evidence exists, state that explicitly — do not infer from theory alone.
+    - Distinguish intended mechanism (the causal logic) from measured effect (what studies found); note if effects faded, rebounded, or produced unintended substitution or leakage.
+    COST — What is the fiscal cost to government AND the compliance/transition cost to regulated parties?
+    - Separate public-budget cost (appropriations or revenue loss) from private compliance cost — they fall on different parties with different political weights.
+    - Use ranges, not point estimates; cite source and evidence tier for both ends of the range.
+    - Flag cost incidence: does the cost fall on high-margin incumbents, small businesses, or end consumers? This is distributional analysis, not a footnote.
+    EQUITY — Who gains and who bears costs, disaggregated by income quintile, geography, race/ethnicity, or sector as material to THIS policy domain?
+    - Name at least one WINNER (group that benefits, specific mechanism, estimated magnitude [evidence tier]) and one LOSER (group that bears net cost or reduced access, specific mechanism, estimated magnitude [evidence tier]).
+    - If equity data are unavailable for this domain, label the equity score [DATA GAP] and specify what data collection (e.g., distributional analysis by CBO, household survey disaggregation) would resolve it.
+    FEASIBILITY — Can this option pass through the required legislative or regulatory process AND be implemented by the administering agency within the relevant time horizon?
+    - Political feasibility: name the likely coalition of support and opposition by specific interest group, chamber faction, or committee gatekeeper — not "opposition exists."
+    - Administrative feasibility: does the agency have existing statutory authority, staffing capacity, and data infrastructure, or must new capacity be authorized and built? Cite the relevant agency's recent track record on analogous rulemakings if available.
+C2. Produce a Summary Scoring Table immediately after C1:
+    | Option | Effectiveness | Cost | Equity | Feasibility | Overall |
+    Use H/M/L cells. Add a one-word verdict in Overall: STRONG / MODERATE / WEAK / ELIMINATE.
+═══ PHASE D — WINNERS, LOSERS, AND DISTRIBUTIONAL ANALYSIS ═══
+D1. For the top two surviving options (excluding Option 0 unless it is the recommendation), produce an explicit Winners/Losers table:
+    | Stakeholder Group | Net Effect | Specific Mechanism | Magnitude Estimate [evidence tier] | Time Horizon |
+D2. Identify any stakeholder group that is a WINNER under one option and a LOSER under the competing option — these are the pivotal constituencies that determine political viability and expose the core equity tradeoff the policymaker is actually making.
+D3. Note second-order distributional effects: e.g., a mandate that benefits end consumers while concentrating compliance costs on small competitors relative to large incumbents with compliance departments; a subsidy nominally universal but practically captured by organized groups with application capacity; a fee that is progressive in intent but regressive in incidence due to geographic variation in alternatives.
+D4. State whether the distributional profile is consistent with the policy's stated equity goals. If there is a tension — the policy achieves its efficiency goal while worsening distributional outcomes, or vice versa — flag it explicitly. Do not smooth it over.
+═══ PHASE E — EVIDENCE INVENTORY ═══
+E1. List every material quantitative claim in the brief with its evidence tier and a short citation. Format:
+    [Claim text] — [T1/T2/T3/T4] [Author/Agency, Year, "Title or URL," date retrieved if web].
+    "Material" means any statistic, cost figure, elasticity, or demographic share used to justify a score or the recommendation.
+E2. Flag any claim marked [CONTESTED] with the range of estimates in the literature and the key methodological dispute driving the disagreement (e.g., instrument validity in a natural experiment, sample selection, counterfactual construction, short-run vs. long-run elasticity estimates).
+E3. Identify the single most load-bearing empirical assumption in the recommendation — the "pivot assumption": the one assumption that, if revised toward the pessimistic end of the literature range, would reverse the recommendation. State it plainly. This is the key uncertainty the decision-maker should monitor.
+E4. Note any section where the evidence base is thin — fewer than three independent studies, or evidence drawn exclusively from a different country or regulatory context without adjustment — and label it [EVIDENCE LIMITED] so the reader calibrates confidence appropriately.
+═══ PHASE F — RECOMMENDATION AND REBUTTAL ═══
+F1. State the recommendation in two sentences: (a) which option (and any modification from D3), (b) the primary rationale grounded in the Phase C scoring and Phase E evidence. Do not re-summarize the options analysis — the recommendation must be earned by what preceded it, not re-argued here.
+F2. If the recommendation involves a modified or phased version of a scored option (e.g., Option B with an equity carve-out identified in D3, or Option A with a sunset clause tied to a specific metric), specify the modification precisely — including the statutory or regulatory hook that enables it and the procedural vehicle required to enact it.
+F3. State the single strongest counterargument: the best case for a competing option or for Option 0, made as forcefully as an intelligent opponent would make it. Then rebut it directly, specifically, and without dismissal — engage the mechanism, not the politics.
+F4. Define two or three leading implementation indicators the administering agency should track in the first 12–24 months to confirm the policy is operating through its intended causal mechanism. These must be process or mechanism metrics (e.g., enrollment rates, inspection completion rates, price-index movement in the targeted sector) — not lagging outcome metrics that only confirm success years after the decision window closes.
+F5. State the reopen threshold: one specific empirical finding, court decision, or budget deviation that — if observed within the policy window — would require revisiting this recommendation. Be concrete: "If [metric X] does not move by [magnitude Y] within [Z months], the pivot assumption in E3 has failed and Option [alternative] should be reconsidered."
+═══ PHASE G — OUTPUT ASSEMBLY (2-PAGE BRIEF FORMAT) ═══
+G1. Assemble the brief in this order; target 950–1,050 words of body text (headings, tables, and citations excluded) to fit a standard two-page print format. The word budget by section:
+    1. TITLE — Policy question as a decision-forcing headline, named decision-maker, jurisdiction, date. (~20 words)
+    2. THE PROBLEM (~150 words): status-quo baseline + causal mechanism from A3 + primary statute from A4. End with the cost-of-inaction figure.
+    3. OPTIONS ANALYZED (~270 words): three-column scoring rationale, one paragraph per surviving option, drawn from Phase B+C. Include the Summary Scoring Table from C2.
+    4. WHO WINS, WHO LOSES (~160 words): Winners/Losers table from D1 + the pivotal constituency from D2.
+    5. EVIDENCE SUMMARY (~100 words): pivot assumption from E3 + all [EVIDENCE LIMITED] and [CONTESTED] flags. This section signals the brief's epistemic confidence to the reader.
+    6. RECOMMENDATION (~200 words): recommendation + any modification + rebuttal + leading indicators + reopen threshold.
+    7. CITATIONS — numbered, formatted as: [#] Author/Agency. "Title." Publisher, Year. URL (retrieved DATE). Statute citations: [Short Title] § Section (Year). Every statistic in sections 1–6 must have a matching numbered citation here — no orphan statistics.
+G2. Every quantitative claim in the brief body must be anchored to a numbered citation in section 7. Run a final cross-check before output.
+G3. If writing to file, produce a single self-contained Markdown document. If presenting inline, use the same section structure with headers. Do not append disclaimers that the brief "is not legal advice" — the brief is a policy analysis document, not legal counsel, and that distinction is structural, not a footer warning.
+KEY PRINCIPLE: A policy brief is an evidence-ranked argument, not a balanced summary. Name the mechanism, score the options on separate axes, own the distributional math, and let the evidence earn the recommendation — while making every load-bearing assumption visible so the decision-maker knows exactly what they are betting on.

package/skills/positioning.md ADDED Viewed

@@ -0,0 +1,192 @@
+---
+name: positioning
+description: Position an offering against named competitive alternatives — extract unique attributes, translate them to customer-valued outcomes, identify the best-fit buyer segment, name the market frame, and produce a positioning statement plus the messaging hierarchy it implies; invoke when launching, repositioning, entering a new segment, or diagnosing why sales are slow.
+allowed-tools: Read, Write
+---
+═══ ENTRY POINT — WHICH MODE ARE YOU IN? ═══
+Before beginning, identify which entry point applies:
+MODE 1 — FRESH POSITIONING: new product, new market, or first time writing a deliberate position.
+Start at Phase A.
+MODE 2 — REPOSITIONING / "SALES ARE SLOW": existing position that is underperforming.
+Start at Phase H1 (win/loss hypothesis) to diagnose the failure mode, then return to the earliest phase that the diagnosis implicates. Common failure modes:
+  - Losing to "do nothing" → Phase A (wrong alternatives framing) or Phase D (wrong segment)
+  - Losing to direct competitors who seem cheaper → Phase E (market frame too broad) or Phase C (outcome not terminal enough)
+  - Winning on features, losing on value → Phase C (attributes not mapped to business outcomes)
+  - Inconsistent sales performance across reps → Phase G (messaging hierarchy missing or inconsistent)
+MODE 3 — NEW SEGMENT ENTRY: proven product, new buyer type or vertical.
+Start at Phase D (segment identification) and re-run A–C only for the new segment's competitive context.
+─────────────────────────────────────────────────────────
+═══ HARD RULES ═══
+1. NEVER start with the product. Start with the competitive alternatives customers actually use when your product does not exist — including "do nothing" or spreadsheets.
+2. NEVER confuse an attribute (what the product has or does) with a value (what the customer gains); every attribute must be traced to a specific, named outcome before it earns a place in positioning.
+3. NEVER define the market frame to flatter the product — define it to make your unique value obvious and your competitive alternatives look like the wrong choice.
+4. NEVER position against a competitor by name externally; position against a category of alternatives (e.g., "manual review processes," "general-purpose LLMs") so the frame is durable when the competitive landscape shifts.
+5. NEVER claim a differentiator that a well-resourced competitor can replicate in one sprint — only structural or proof-backed differentiation survives. "Fast," "easy," and "flexible" are not positions.
+6. NEVER let "everyone" be the target segment. Identify the specific buyer whose current pain is sharpest, whose alternatives are worst, and who will pay the most to switch.
+7. NEVER present a positioning statement that works equally well for a competitor. If you can substitute a rival's name and it still reads true, the statement is not differentiated.
+8. NEVER fabricate customer quotes, market-size figures, or win/loss ratios as if they are ground truth. Label all estimates as estimates; cite source or mark as hypothesis.
+9. NEVER collapse the positioning statement and the tagline — they serve different audiences. The positioning statement is internal strategy; the tagline is public-facing and derived from it, not the other way around.
+10. Flag immediately when the claimed differentiator is already table stakes in the target market frame — owning "fast" in a market where everyone is fast is not a position; it is a survival requirement.
+11. FOR CONSUMER PRODUCTS: the "economic buyer" and "end user" are typically the same person; replace B2B role distinctions (economic buyer / technical evaluator / end user) with the single consumer's decision journey (awareness trigger → evaluation criteria → purchase moment → repeat signal). All other phases apply unchanged.
+═══ PHASE A — COMPETITIVE ALTERNATIVES AUDIT ═══
+A1. Enumerate the full set of alternatives the target buyer actually uses today, not the alternatives you wish they were using:
+    - Direct competitors (same category, same job-to-be-done)
+    - Indirect competitors (different mechanism, same job — e.g., a consultant instead of a SaaS tool; a spreadsheet instead of a database)
+    - The status-quo workaround (email threads, manual checklists, doing nothing at all)
+    - Internal-build option (for B2B: "we could build this ourselves in Q3")
+A2. For each alternative, characterize:
+    - What job does it get done, and how well on a 1–5 scale?
+    - What is the buyer's primary frustration or hard ceiling with it? (Name the specific constraint, not a vague "limitation.")
+    - What switching cost keeps buyers locked in? (Data migration, retraining, sunk-cost contracts, internal political capital.)
+    - Who inside the buying org owns and defends the current alternative?
+A3. Identify the "assumed alternatives" problem: buyers who do not know your category exists will compare you to the nearest mental model they already hold. Name that mental model explicitly — your positioning must redirect it before any feature discussion can land.
+A4. BREAK FLAG: if you cannot name at least two concrete alternatives with specific buyer frustrations based on actual buyer conversations or documented win/loss data, you do not yet have enough market understanding to write a valid position. Stop and gather interview data before proceeding. Desk research is not a substitute.
+═══ PHASE B — UNIQUE ATTRIBUTES EXTRACTION ═══
+B1. List every attribute of the offering that differs from each competitive alternative identified in Phase A. An attribute is a feature, capability, architectural choice, business model property, or verifiable proof point — anything observable or demonstrable.
+B2. Filter to attributes that satisfy all three conditions:
+    (a) REAL — actually present in the product today, not on the roadmap (roadmap items belong in a "near-term proof" appendix, clearly labeled)
+    (b) PROVABLE — demonstrable via demo, benchmark, audit trail, or third-party verification; not just claimed
+    (c) UNIQUE relative to at least the primary competitive alternative (you do not need to beat every alternative on every attribute, but the primary alternative must lack it)
+B3. Rank remaining attributes by three factors:
+    - Degree of uniqueness: structural moat (IP, network effect, exclusive data) vs. current lead (six months ahead) vs. marginal parity (barely different)
+    - Replication cost for a well-funded competitor: low (one sprint) / medium (one quarter) / high (one year+) / structural (requires fundamentally different architecture or business model)
+    - Buyer awareness of the gap: do buyers currently know to ask for this? Unknown advantages require market education, not just positioning.
+B4. Separate "points of differentiation" (attributes unique to you relative to alternatives) from "points of parity" (attributes buyers require just to be in consideration at all). Points of parity belong in proof material and technical evaluation docs — not in the positioning statement or headline.
+B5. BREAK FLAG: if all surviving unique attributes have low or medium replication cost, the moat is thin and the position is time-limited. Flag it explicitly. Proceed, but note that a structural platform or proof-based moat must be built before the position erodes — and do not position as if the moat is durable when it is not.
+═══ PHASE C — VALUE MAPPING ═══
+C1. For each point of differentiation from Phase B, write the explicit value chain:
+    [ATTRIBUTE] → enables [FUNCTIONAL OUTCOME] → which means [BUSINESS OR EMOTIONAL OUTCOME] for [SPECIFIC ROLE OR BUYER TYPE]
+    Illustrative example (label as hypothesis until validated with buyers):
+    "Cryptographically signed audit trail of every agent action" → enables independent third-party verification without vendor access → which means a CISO can demonstrate compliance to auditors without trusting the vendor's own logs → reducing audit prep time and removing a regulatory liability the CISO personally owns.
+    Note: "reducing audit prep time" is functional; "removing a regulatory liability the CISO personally owns" is the terminal business/emotional outcome. Both must be named.
+C2. Functional outcomes are intermediate (saves time, reduces errors, removes a manual step). Business outcomes are terminal (reduces audit cost by X%, removes regulatory risk, enables a new revenue action, protects a job). Both matter; the terminal outcome is what drives willingness to pay and what the economic buyer will fund.
+C3. Identify which outcomes are "table stakes" in the target market frame vs. "above the line":
+    - Table stakes: buyers expect them and will not pay a premium for them; losing on these loses the deal, but winning on them alone does not win it.
+    - Above the line: buyers will prioritize and actively pay for these. Only above-the-line outcomes belong in the core positioning statement.
+C4. Check for value the buyer does not yet know to articulate. If a unique attribute enables an outcome buyers have not been asking for, note it — this is a market education task that requires a "before/after" narrative, not a feature bullet. It cannot be a positioning shortcut.
+C5. BREAK FLAG: if no attribute produces an above-the-line terminal outcome for the target buyer, the offer is undifferentiated at the value level even if it is differentiated at the feature level. Return to product (add or surface the missing capability) or pricing (reframe the value per unit) before writing positioning.
+═══ PHASE D — BEST-FIT SEGMENT IDENTIFICATION ═══
+D1. Define the target segment not by demographic or firmographic alone, but by the combination of four buying-moment factors:
+    - Pain intensity: who feels the frustration with current alternatives most acutely, right now, with budget authority to act?
+    - Value capture: who gains the most measurable terminal outcome from your unique attributes?
+    - Switching readiness: who has the lowest switching cost from the primary alternative and the most urgent trigger to switch?
+    - Advocacy potential: who, once they experience the outcome, will visibly demonstrate it to adjacent buyers — creating a reference motion you can scale?
+D2. For each candidate segment, score on all four dimensions (H/M/L). The best-fit segment maximizes pain intensity + value capture with manageable switching friction. Do not pick the largest addressable segment; pick the segment where the first 10 customers are easiest to win and most likely to make the next 100 easier.
+D3. Name the specific buyer role who drives the purchase decision for this segment. For B2B, distinguish:
+    - Economic buyer: controls budget and signs the contract
+    - Technical evaluator: assesses fit and raises or removes blockers
+    - End user: experiences daily value (or daily friction) and influences renewal
+    These may be the same person in SMB and almost never are in enterprise.
+D4. Write one concrete paragraph describing a day in the life of the best-fit buyer before your product exists. Name the specific tool they use, the specific frustration that surfaces on a given Tuesday, and the specific outcome they are missing. If you cannot write this paragraph without generalities, the segment is underspecified — return to buyer interviews.
+D5. BREAK FLAG: if the best-fit segment is too small to build a sustainable business on (validate against realistic conversion rates and ACV), or if the switching cost exceeds what the segment can fund, name that tension explicitly. Do not paper over an economic problem with sharper positioning language — the market will not reward it.
+═══ PHASE E — MARKET FRAME SELECTION ═══
+E1. The market frame is the category name you place yourself in. It controls three things: which competitors buyers compare you to, which evaluation criteria they apply, and what price anchor they expect before the first conversation. It is the single highest-leverage positioning decision, because every subsequent message is interpreted through it.
+E2. Enumerate at least three candidate market frames. For each, assess:
+    - Which alternatives does this frame invite comparison to?
+    - Does this frame make your unique attributes look decisive, or does it make them look like table stakes?
+    - Does the buyer already have a budget line item for this frame, or must you create a new one? (Creating a new budget line is expensive; fitting into an existing one is faster.)
+    - Is the frame defensible as you scale, or does proving the space attract larger players who will outspend you?
+E3. Select the frame where your unique attributes are most clearly above the line relative to the alternatives the frame implies. A narrower, more specific frame is almost always better than a broad one for an early-stage or repositioning exercise — specificity reduces competition and sharpens the evaluation criteria in your favor.
+E4. If no existing frame fits cleanly (genuine category creation), define the new frame by naming: the job-to-be-done (what the buyer is trying to accomplish) + the mechanism (how your product does it differently) + the proof layer (what makes it credible without requiring the buyer to trust your claims). Category creation requires significantly more market education investment; flag this explicitly so the team budgets for it.
+E5. BREAK FLAG: if the chosen frame implies a natural price anchor that is materially below your required pricing, the frame is wrong — not the pricing. Either move to a frame whose buyers historically pay what you need, or surface a higher-order outcome (Phase C) that justifies the premium before finalizing the frame.
+═══ PHASE F — POSITIONING STATEMENT SYNTHESIS ═══
+F1. Assemble the positioning statement using this six-clause structure (internal use only; not a tagline or homepage copy):
+    FOR [best-fit target segment from Phase D, named specifically]
+    WHO [the specific pain or frustration with the primary alternative from Phase A]
+    [PRODUCT NAME] IS A [market frame from Phase E]
+    THAT [the primary above-the-line terminal outcome from Phase C]
+    UNLIKE [the primary competitive alternative from Phase A, named as a category not a brand]
+    WE [the structural differentiator from Phase B that makes the outcome possible and is hardest to replicate]
+F2. Apply the substitution test: replace your product name with your primary direct competitor's name. If the statement still reads true for them, it is not differentiated — return to Phase B, find the attribute they genuinely lack, and rebuild from there.
+F3. Apply the "so what?" test to every clause: read each phrase and ask whether the target buyer would stop a meeting for it. Remove any phrase that passes only the writer's test. Jargon, category superlatives ("best-in-class," "cutting-edge"), and adjectives that cannot be benchmarked must be cut.
+F4. Apply the proof test: for every claim in the statement, identify the evidence that would make a skeptical buyer believe it (live demo, published benchmark, case study with named customer, third-party audit, independent verifier). If no evidence exists and cannot be created within 90 days, remove the claim from the statement and move it to the roadmap.
+F5. Write the final positioning statement. Label each element as: PROVEN (evidence exists today), DEMONSTRABLE (demo-able but not yet third-party verified), or HYPOTHESIS (believed but not yet validated with buyers). This labeling is not for external use — it is a forcing function to close the proof gaps.
+═══ PHASE G — MESSAGING HIERARCHY ═══
+G1. Derive the messaging hierarchy from the positioning statement in F1, not from a feature list. Three levels:
+    LEVEL 1 — Category headline (one sentence, public-facing): names the market frame and the primary terminal outcome from Phase C. This is the closest to a tagline; it must make the best-fit buyer from Phase D stop, recognize their situation, and read the next line. Test it: would a buyer in your best-fit segment forward this to a colleague?
+    LEVEL 2 — Value pillars (three, maximum four): each pillar maps to one above-the-line outcome from Phase C. Each pillar requires: (a) a name (two to four words), (b) one supporting sentence in buyer language from Phase H2, and (c) the specific proof point that makes a skeptic believe it. Pillars without proof points are aspirations, not positions.
+    LEVEL 3 — Feature evidence: specific capabilities and attributes from Phase B that substantiate each value pillar. These appear in sales decks, technical evaluations, and documentation — not on the homepage headline. The error is promoting Level 3 content to Level 1 placement.
+G2. For each buyer role identified in Phase D (economic buyer, technical evaluator, end user; or for consumer: awareness / evaluation / repeat moments), map which level and which pillar speaks most directly to their primary concern. A sales or marketing motion should route the right message to the right role at the right stage — not deliver the full hierarchy to everyone at once.
+G3. Write the anti-positioning: the one thing you explicitly do NOT claim, with a one-sentence rationale grounded in the positioning statement from F1. The anti-positioning sharpens the position by drawing a hard boundary. It prevents the messaging from drifting back to generic over time. Example structure: "We do not position as [X] because doing so would invite comparison to [Y alternative] where our [Z attribute] is not decisive."
+G4. Identify the single proof point that, if demonstrated early in a sales or onboarding motion, most rapidly confirms the core promise to the best-fit buyer. This is the "aha moment" to engineer into the product experience, the sales demo script, and the first-week onboarding. Everything that delays this moment costs conversion.
+G5. BREAK FLAG: if the messaging hierarchy requires more than three logical steps before the target buyer understands why they should care, the position is too complex. Simplify the market frame (Phase E) or narrow the segment (Phase D) until one clean line of logic runs: market frame → specific pain → unique outcome → proof. Complexity at the positioning level compounds into confusion at the sales level.
+═══ PHASE H — VALIDATION CHECKPOINTS ═══
+H1. Win/loss hypothesis: state which competitive alternative you expect to win against most reliably, and state the specific mechanism of the win (which unique attribute, which buyer role, which trigger). This should follow directly from the positioning statement. If it does not, the positioning has an internal inconsistency — return to Phase A or Phase B.
+H2. Buyer language audit: place the positioning statement and messaging hierarchy next to verbatim language from customer interviews, support tickets, sales call transcripts, or review sites. Highlight every phrase that is your language but not the buyer's. Rewrite those phrases in the buyer's exact words. If no verbatim buyer language is available, this is a research gap — not a drafting gap.
+H3. Stress-test against the "good enough" objection: the most common objection in any sale is "our current solution is good enough." Write one sentence that names the specific, quantifiable cost or risk of "good enough" in terms the economic buyer personally owns (their budget, their liability, their performance review). Generic "you're leaving value on the table" answers do not work.
+H4. Pricing alignment check: confirm that the market frame, the named terminal outcome, and the buyer segment all point to a willingness-to-pay that brackets your actual price. Validate against: (a) what buyers in this frame historically pay for alternatives, (b) the ROI implied by the terminal outcome in Phase C, (c) the buyer's budget authority at the segment defined in Phase D. If any of the three signals land below your price, diagnose which lever to pull before launch.
+H5. BREAK FLAG: if win/loss data, buyer interviews, or closed deals systematically contradict the positioning — for example, buyers consistently cite a different reason for buying than the positioning claims, or the stated differentiator is never mentioned in sales calls — return to Phase B (attributes) not Phase F (statement). Rewriting the statement around a differentiator buyers do not value is positioning theater.
+H6. Output a one-page positioning brief as a written artifact (Write tool). Include:
+    (a) The six-clause positioning statement from F1, with PROVEN/DEMONSTRABLE/HYPOTHESIS labels
+    (b) The three value pillars with proof points from G1
+    (c) The primary competitive alternative and the specific mechanism of the win
+    (d) The best-fit segment definition with the D4 day-in-the-life paragraph
+    (e) The market frame and the rationale for selecting it over the alternatives considered
+    (f) The anti-positioning statement from G3
+    (g) The aha-moment proof point from G4
+    (h) The "good enough" objection response from H3
+    (i) A labeled list of claims that are currently HYPOTHESIS, with the specific evidence needed to promote each to DEMONSTRABLE or PROVEN, and the owner and deadline for each
+KEY PRINCIPLE: Positioning is not what you say about your product. It is the context you create in the buyer's mind so that your unique value is the obvious, logical conclusion the buyer reaches — before you make a single claim about yourself.

package/skills/postmortem.md ADDED Viewed

@@ -0,0 +1,168 @@
+---
+name: postmortem
+description: Write blameless, timeline-driven postmortem for production incidents; root cause via 5-whys + systemic analysis; action items with owners + dates
+allowed-tools: Read, Write
+---
+# POSTMORTEM PLAYBOOK
+## ═══ HARD RULES ═══
+1. **Blameless always** — root cause is SYSTEMS not people; "person X clicked" only appears as "process did not prevent"
+2. **Timeline is HARD** — UTC timestamps; detective work required; "we don't know exactly" is better than guessing
+3. **5 Whys = DEPTH, not breadth** — stop at systemic cause (monitoring gap, missing test, deployment process), not "they didn't know"
+4. **Impact must be MEASURED** — not "high"; count: requests failed, users affected, revenue lost, time-to-detect (minutes), time-to-mitigate (minutes)
+5. **Action items must be EXECUTABLE** — assign OWNER (single human, not "the team"), set DUE DATE (specific day, not "asap"), phrase as "implement X" not "improve Y"
+6. **Examples in this doc are labeled examples** — never copy a timestamp or issue number without replacing it; fabricating specifics is a hard failure
+---
+## ═══ PHASE 0: INTAKE (before you write anything — 5 min) ═══
+Gather BEFORE drafting. If you can't get one of these, say so explicitly in the postmortem; do not guess.
+| Input | Source |
+|-------|--------|
+| Incident start/end time (UTC) | Cloudflare R2 or RunPod dashboard, Vercel function logs, or on-call page timestamp |
+| Which surface broke | CLI (`theron-cli`), VS Code ext, marketing API, agent harness (`marketing/api/`), or model endpoint (RunPod Serverless) |
+| Which data tier affected | Neon Postgres, Cloudflare R2, Upstash Redis, or agent scratchpad (`scratchpad.ts`) |
+| Who was on-call | Name (for timeline; blameless ≠ anonymous) |
+| Customer-visible signal | Theron chat error, receipt verify failure, STOA gate rejection, or silent data drop |
+| Mitigation action taken | Code deploy, config change, RunPod endpoint restart, Vercel rollback, or manual data flush |
+---
+## ═══ PHASE A: INCIDENT SNAPSHOT (20 min) ═══
+### A1. Impact Statement (1–3 bullets)
+- **What broke**: service/component/data (precise noun — "Cyber agent scratchpad writes" not "things")
+- **Who was affected**: count of users / active agent runs / receipts in flight (quantify; pull from Neon `runs` or `agent_sessions`)
+- **Cost**: requests failed (n), SLA credit issued, revenue impact, or data consistency debt — specify which table and row range if data was affected
+**Example**: "Cyber agent scratchpad writes silently failed for 47 min; 312 active runs in `agent_sessions` produced incomplete findings; ~$840 SLA credit issued; no rows in `artifacts` corrupted."
+### A2. Timeline (UTC, minute-level precision)
+```
+YYYY-MM-DD HH:MM — Detection:  [source: alert / user report / Vercel log tail]
+YYYY-MM-DD HH:MM — Triage:     [who confirmed, what dashboard/log they checked]
+YYYY-MM-DD HH:MM — Root Cause: [first moment the actual cause was identified]
+YYYY-MM-DD HH:MM — Mitigation: [change made that stopped the bleeding]
+YYYY-MM-DD HH:MM — Resolution: [full service restored, data consistency confirmed]
+YYYY-MM-DD HH:MM — All-clear:  [success rate / error rate back to baseline, noted in Neon or dashboard]
+```
+**Compute explicitly**: Detection→Mitigation (minutes) | Mitigation→Resolution (minutes) | Total incident window (minutes).
+If any interval is unknown, write: "HH:MM UNKNOWN — [what we don't know and why]". Do not omit the row.
+### A3. Severity & Justification
+Pick one: **critical** (total outage, data loss, security breach) / **high** (major feature degraded, >100 users, SLA hit) / **medium** (partial degradation, <100 users, no SLA breach) / **low** (latency regression, <10 users, auto-recovered).
+Justify with the impact numbers from A1. One sentence.
+---
+## ═══ PHASE B: ROOT CAUSE ANALYSIS — 5-Whys + Systemic (45 min) ═══
+### B1. The 5-Whys (MANDATORY DEPTH)
+Start with the lowest-level failure observable in logs. Ask "why" five times. **Each answer must name a concrete system component, config value, or absent process — not a human decision.**
+```
+1. Why did [observable failure] occur?
+   → [Component] [did X / returned HTTP NNN / wrote stale value]
+2. Why did [component] behave that way?
+   → [Upstream dependency / config / version]
+3. Why was [dependency/config] in that state?
+   → [Missing alert / absent circuit breaker / onboarding shortcut]
+4. Why was [alert/circuit breaker] absent?
+   → [Process gap: no checklist, no review, no test for this path]
+5. Why was the process gap not caught?
+   → SYSTEMIC ROOT: [missing organizational/architectural control]
+```
+**SYSTEMIC ROOT must name a boundary** — pick the exact seam where the gap lives:
+- Agent harness ↔ storage tier (R2 / Neon / Redis)
+- Model endpoint (RunPod Serverless) ↔ API gateway (Vercel function / `marketing/api/`)
+- CLI / VS Code ext ↔ backend API (`src/api.ts`)
+- Deployment pipeline (GH Actions → GHCR → RunPod) ↔ observability (no post-deploy smoke test)
+- Provider transition (e.g., S3→R2, Modal→RunPod) ↔ monitoring parity audit (absent checklist)
+### B2. Why-Tree for Cascading Failures (use only if two independent failure modes compounded)
+```
+                    [Observed failure]
+                       /           \
+          [Primary cause]      [Compounding cause]
+             /     \                   |
+   [Layer A]  [Layer B]        [Layer C missing]
+```
+### B3. Systemic Factors — VEXT Stack Specific (check all that apply)
+- [ ] **R2 observability**: No per-region quota alert, no per-bucket write success counter, no R2 client circuit breaker in `scratchpad.ts`
+- [ ] **RunPod Serverless cold-start / timeout**: Model endpoint not warmed (`min_workers=0`), idle_timeout too short, or max_workers hit without queue depth alert
+- [ ] **Neon Postgres connection pool**: Serverless function exhausted connection limit; no `pg_bouncer`-style guard; migration applied to wrong branch
+- [ ] **Vercel function timeout / env var**: Function hit 10s wall, or an env var (`CYBER_URL`, `THERON_LLM_URL`) was stale after RunPod endpoint re-deploy
+- [ ] **Agent harness silent failure**: `agent_sessions` row left in `running` state indefinitely; no dead-letter queue for `agent_actions` leases; scratchpad write failure not surfaced to the run's receipt
+- [ ] **Provider-transition gap**: New provider (R2, Upstash, Neon) onboarded without running monitoring parity audit against the service it replaced
+- [ ] **Deployment pipeline**: GH Actions built and pushed new container, but RunPod endpoint was not restarted; weights pulled from R2 on cold start reflected stale checkpoint
+- [ ] **Receipt / STOA integrity**: A failed write left a receipt that passes WebCrypto verify but references missing artifact data — integrity gap between signature and content
+---
+## ═══ PHASE C: WHAT WENT WELL / POORLY (30 min) ═══
+### C1. What Went Well (3–5 items)
+**Specific and mechanical** — praise systems and process choices, not heroic individuals. If an individual's action was key, name the *process* that enabled them (runbook, alert, flag-gated fallback).
+For each item, note: what worked, why it worked, and whether we can rely on it next time or if it was luck.
+- Example: "Fallback-to-disk in `scratchpad.ts` prevented 500s to end users — this was a deliberate architectural choice; it held. Confirmed reliable."
+- Example: "R2 write latency alert fired within 5 min; on-call paged. Alert threshold was set correctly at onboarding. Reliable."
+- Anti-pattern: "Team responded quickly" — this is luck and heroism, not a system property. Reframe: "On-call runbook for R2 latency existed; first action was unambiguous."
+### C2. What Went Poorly (3–7 items; each gets one action item in Phase D)
+**One-to-one mapping**: every "went poorly" generates exactly one Phase D action item. Do not list a failure without an owner and a fix.
+- Anti-pattern: "Communication was slow" — not specific. Instead: "Customer-facing status page was not updated until 35 min after detection; no runbook step required it."
+---
+## ═══ PHASE D: ACTION ITEMS (15 min) ═══
+**Format** — use this exactly:
+```
+[OWNER: single person's name or role] — <Action Title (imperative verb + noun)>
+  Why: <one sentence linking to the specific C2 failure>
+  Due: <YYYY-MM-DD, within 14 calendar days>
+  Done when: <observable completion criterion — merged PR / alert firing in staging / runbook doc URL live>
+  Issue: <GitHub issue URL or tracking doc — create it now, do not defer>
+```
+**Quality bar**: "Add `r2_writes_failed_total` counter to `scratchpad.ts` and wire to PagerDuty alert" passes. "Improve R2 monitoring" fails — it has no observable done-state.
+Every action item must be completable by one person in the due window. If it requires multiple people, split it into multiple items. If it requires a budget decision, note that and assign the budget ask to a specific owner.
+---
+## ═══ PHASE E: COMMUNICATION & ARCHIVE (10 min) ═══
+### E1. Stakeholder Notification (within 30 min of detection)
+- Post to `#incidents` Slack (or equivalent) with: incident ID, what broke, affected user count, current status, ETA
+- For SLA-impacting incidents (severity high/critical): email affected customers within 2 hours; include impact window, affected feature, and remediation taken — do not apologize without facts
+- If a Theron receipt was issued during the incident window with a broken artifact reference, flag those receipts explicitly — receipt integrity is a trust-critical surface; silently broken receipts are worse than visible errors
+### E2. Draft Review (within 48 hours)
+- Share this document with on-call and at least one person not involved in mitigation for adversarial review
+- Ask specifically: "Does the systemic root cause point at a real gap, or is it too abstract to act on?" and "Are all action items completable in 14 days?"
+- Do not publish until at least one reviewer confirms Phase D items have GitHub issues created
+### E3. Archive
+- Save to: `/docs/incidents/postmortem-<YYYY-MM-DD>-<service-or-component>.md`
+- Add a one-line entry to `/docs/incidents/INDEX.md`: `| <date> | <sev> | <component> | <one-line root cause> | <n action items> |`
+- Before the next provider migration or major deploy, re-read the index for recurring patterns — the most dangerous incidents are the second ones with the same systemic root
+---
+## KEY PRINCIPLE: A postmortem is not an inquest — it is a boundary-condition report. Every significant incident in this stack lives at one of four seams: agent harness ↔ storage (R2/Neon), model endpoint (RunPod) ↔ API gateway (Vercel), CLI/ext ↔ backend, or provider-transition ↔ monitoring parity. Find which seam, name the missing control that should span it, and build that control as a Phase D item. If your systemic root could apply to any tech company, you have not gone deep enough.