npm - @vextlabs/theron-cli - Versions diffs - 0.2.1 → 0.4.0 - Mend

@vextlabs/theron-cli 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (191) hide show

package/dist/api.d.ts +8 -0
package/dist/api.js +3 -0
package/dist/api.js.map +1 -1
package/dist/auth.js +51 -1
package/dist/auth.js.map +1 -1
package/dist/banner.js +3 -2
package/dist/banner.js.map +1 -1
package/dist/checkpoints.d.ts +32 -0
package/dist/checkpoints.js +61 -0
package/dist/checkpoints.js.map +1 -0
package/dist/index.js +61 -5
package/dist/index.js.map +1 -1
package/dist/input.d.ts +61 -0
package/dist/input.js +574 -0
package/dist/input.js.map +1 -0
package/dist/profiles/index.js +5 -0
package/dist/profiles/index.js.map +1 -1
package/dist/profiles/methodologies/build_domains.d.ts +6 -0
package/dist/profiles/methodologies/build_domains.js +170 -0
package/dist/profiles/methodologies/build_domains.js.map +1 -0
package/dist/profiles/methodologies/operate_domains.d.ts +8 -0
package/dist/profiles/methodologies/operate_domains.js +1239 -0
package/dist/profiles/methodologies/operate_domains.js.map +1 -0
package/dist/profiles/methodologies/regulated_domains.d.ts +6 -0
package/dist/profiles/methodologies/regulated_domains.js +153 -0
package/dist/profiles/methodologies/regulated_domains.js.map +1 -0
package/dist/profiles/methodologies/research_domains.d.ts +8 -0
package/dist/profiles/methodologies/research_domains.js +179 -0
package/dist/profiles/methodologies/research_domains.js.map +1 -0
package/dist/profiles/methodologies/strategy_domains.d.ts +15 -0
package/dist/profiles/methodologies/strategy_domains.js +193 -0
package/dist/profiles/methodologies/strategy_domains.js.map +1 -0
package/dist/profiles/seeds.js +241 -95
package/dist/profiles/seeds.js.map +1 -1
package/dist/receipt.d.ts +17 -0
package/dist/receipt.js +46 -0
package/dist/receipt.js.map +1 -0
package/dist/render.d.ts +4 -1
package/dist/render.js +95 -28
package/dist/render.js.map +1 -1
package/dist/repl.d.ts +8 -1
package/dist/repl.js +420 -62
package/dist/repl.js.map +1 -1
package/dist/sessions.d.ts +14 -0
package/dist/sessions.js +100 -0
package/dist/sessions.js.map +1 -1
package/dist/ship.d.ts +2 -0
package/dist/ship.js +62 -0
package/dist/ship.js.map +1 -0
package/dist/skills/catalog.d.ts +13 -0
package/dist/skills/catalog.js +86 -0
package/dist/skills/catalog.js.map +1 -0
package/dist/tools/bash.js +81 -14
package/dist/tools/bash.js.map +1 -1
package/dist/tools/edit.js +21 -1
package/dist/tools/edit.js.map +1 -1
package/dist/tools/glob.js +4 -1
package/dist/tools/glob.js.map +1 -1
package/dist/tools/grep.d.ts +5 -0
package/dist/tools/grep.js +101 -2
package/dist/tools/grep.js.map +1 -1
package/dist/tools/index.d.ts +22 -0
package/dist/tools/index.js +177 -41
package/dist/tools/index.js.map +1 -1
package/dist/tools/ls.d.ts +3 -0
package/dist/tools/ls.js +23 -12
package/dist/tools/ls.js.map +1 -1
package/dist/tools/multiedit.d.ts +12 -0
package/dist/tools/multiedit.js +79 -0
package/dist/tools/multiedit.js.map +1 -0
package/dist/tools/stoa.d.ts +1 -1
package/dist/tools/stoa.js +7 -3
package/dist/tools/stoa.js.map +1 -1
package/dist/tools/task.d.ts +9 -0
package/dist/tools/task.js +166 -0
package/dist/tools/task.js.map +1 -0
package/dist/tools/todowrite.d.ts +12 -0
package/dist/tools/todowrite.js +38 -0
package/dist/tools/todowrite.js.map +1 -0
package/dist/tools/webfetch.d.ts +6 -0
package/dist/tools/webfetch.js +98 -0
package/dist/tools/webfetch.js.map +1 -0
package/dist/tools/websearch.d.ts +7 -0
package/dist/tools/websearch.js +83 -0
package/dist/tools/websearch.js.map +1 -0
package/dist/tools/write.js +17 -1
package/dist/tools/write.js.map +1 -1
package/dist/verifiers/calc_gate.d.ts +2 -0
package/dist/verifiers/calc_gate.js +112 -0
package/dist/verifiers/calc_gate.js.map +1 -0
package/dist/verifiers/citation_gate.d.ts +2 -0
package/dist/verifiers/citation_gate.js +130 -0
package/dist/verifiers/citation_gate.js.map +1 -0
package/dist/verifiers/confidence_marked.d.ts +2 -0
package/dist/verifiers/confidence_marked.js +49 -0
package/dist/verifiers/confidence_marked.js.map +1 -0
package/dist/verifiers/disclaimer_gate.d.ts +2 -0
package/dist/verifiers/disclaimer_gate.js +57 -0
package/dist/verifiers/disclaimer_gate.js.map +1 -0
package/dist/verifiers/evidence_gate.d.ts +2 -0
package/dist/verifiers/evidence_gate.js +108 -0
package/dist/verifiers/evidence_gate.js.map +1 -0
package/dist/verifiers/index.d.ts +5 -0
package/dist/verifiers/index.js +28 -7
package/dist/verifiers/index.js.map +1 -1
package/dist/verifiers/lint.js +4 -3
package/dist/verifiers/lint.js.map +1 -1
package/dist/verifiers/promoted_kernels.d.ts +8 -0
package/dist/verifiers/promoted_kernels.js +190 -0
package/dist/verifiers/promoted_kernels.js.map +1 -0
package/dist/verifiers/source_gate.d.ts +2 -0
package/dist/verifiers/source_gate.js +125 -0
package/dist/verifiers/source_gate.js.map +1 -0
package/dist/verifiers/test_smoke.js +30 -0
package/dist/verifiers/test_smoke.js.map +1 -1
package/dist/verifiers/types.d.ts +3 -0
package/package.json +4 -2
package/skills/README.md +123 -0
package/skills/ab-test.md +89 -0
package/skills/api-design.md +175 -0
package/skills/architecture-design.md +185 -0
package/skills/business-case.md +77 -0
package/skills/causal-inference.md +77 -0
package/skills/clinical-guideline.md +98 -0
package/skills/code-review.md +98 -0
package/skills/cold-outreach.md +268 -0
package/skills/competitive-teardown.md +223 -0
package/skills/component-spec.md +121 -0
package/skills/content-calendar.md +280 -0
package/skills/contract-review.md +155 -0
package/skills/data-analysis.md +187 -0
package/skills/debug.md +91 -0
package/skills/design-audit.md +121 -0
package/skills/differential-diagnosis.md +79 -0
package/skills/discovery-call.md +206 -0
package/skills/edit-pass.md +80 -0
package/skills/engineering-calc.md +101 -0
package/skills/estimate.md +70 -0
package/skills/experiment-design.md +105 -0
package/skills/fact-check.md +82 -0
package/skills/financial-model.md +104 -0
package/skills/grant-proposal.md +93 -0
package/skills/harmony-analysis.md +93 -0
package/skills/hypothesis-generation.md +99 -0
package/skills/incident-response.md +134 -0
package/skills/interview-loop.md +62 -0
package/skills/job-scorecard.md +92 -0
package/skills/kb-article.md +174 -0
package/skills/launch-plan.md +85 -0
package/skills/lease-review.md +93 -0
package/skills/lesson-plan.md +198 -0
package/skills/literature-review.md +69 -0
package/skills/market-entry.md +137 -0
package/skills/market-sizing.md +159 -0
package/skills/meta-analysis.md +140 -0
package/skills/migrate.md +117 -0
package/skills/optimize.md +88 -0
package/skills/options-strategy.md +166 -0
package/skills/peer-review.md +96 -0
package/skills/pentest-plan.md +193 -0
package/skills/pitch-review.md +132 -0
package/skills/plan.md +88 -0
package/skills/policy-brief.md +124 -0
package/skills/positioning.md +192 -0
package/skills/postmortem.md +168 -0
package/skills/prd.md +105 -0
package/skills/prioritize.md +162 -0
package/skills/proof.md +91 -0
package/skills/property-underwrite.md +159 -0
package/skills/recipe-develop.md +109 -0
package/skills/red-team.md +142 -0
package/skills/refactor.md +58 -0
package/skills/reflection-session.md +115 -0
package/skills/regulatory-compliance.md +136 -0
package/skills/reproduce.md +87 -0
package/skills/runbook.md +344 -0
package/skills/security-audit.md +154 -0
package/skills/seo-brief.md +201 -0
package/skills/sql-query.md +161 -0
package/skills/story-craft.md +163 -0
package/skills/tdd.md +59 -0
package/skills/term-sheet.md +298 -0
package/skills/theory-of-change.md +88 -0
package/skills/threat-model.md +104 -0
package/skills/ticket-triage.md +200 -0
package/skills/tolerance-analysis.md +149 -0
package/skills/training-program.md +151 -0
package/skills/translate.md +64 -0
package/skills/unit-economics.md +238 -0
package/skills/valuation.md +112 -0
package/skills/write-tests.md +77 -0

package/skills/incident-response.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+name: incident-response
+description: Run a live production incident end-to-end — declare severity, assign IC/comms/ops roles, mitigate-before-diagnose to restore SLO, drive cadenced stakeholder comms, maintain a live timeline, bound every mitigation's blast radius, and hand off a postmortem-ready artifact.
+allowed-tools: Read, Write, Bash
+---
+═══ HARD RULES ═══
+1. MITIGATE BEFORE DIAGNOSE. Restore the SLO first; root-cause analysis happens after customer impact ends — never delay a rollback waiting for a perfect explanation.
+2. ONE INCIDENT COMMANDER AT ALL TIMES. Exactly one named person holds the IC seat. IC handoff is explicit: outgoing IC states "IC transfer to @[name], acknowledged?" and the incoming IC confirms in the channel before the outgoing IC is released. Split authority kills response speed.
+3. BOUND EVERY BLAST RADIUS BEFORE EXECUTING. State the scope of harm (traffic %, regions, data written, downstream services) for every proposed mitigation before applying it. A bad fix widens incidents.
+4. NO SILENT MITIGATIONS. Every action taken in the production environment is logged in the live timeline with actor, timestamp (UTC), action, and observed effect within 2 minutes of execution.
+5. COMMS CADENCE IS NON-NEGOTIABLE. External status updates go out on schedule even when you have nothing new to say — silence is interpreted as chaos.
+6. NEVER DOWNGRADE SEVERITY MID-INCIDENT WITHOUT IC SIGN-OFF. A SEV-1 stays SEV-1 until the IC formally downgrades it after SLO restoration is confirmed on two independent signals.
+7. POSTMORTEM IS BLAMELESS AND MANDATORY. Every SEV-1 and SEV-2 produces a written postmortem within 5 business days. No exceptions for "obvious" causes.
+8. PRESERVE EVIDENCE BEFORE REMEDIATION. Capture logs, metrics snapshots, and heap/thread dumps before you restart, rollback, or flush any state that would overwrite them.
+═══ PHASE A — DETECT & DECLARE (T+0 to T+5 min) ═══
+A1. TRIAGE SIGNAL. Confirm the alert is real: cross-check at least two independent signals (monitoring dashboard, synthetic probe, customer report, log anomaly). Dismiss flapping alerts with a documented reason, not silence.
+A2. ASSIGN SEVERITY IMMEDIATELY using objective SLO impact, not gut feel. Fill in your org's thresholds once; do not leave these as "TBD" mid-incident:
+    SEV-1 — SLO breach affecting ≥5% of active users OR any data loss/corruption OR full service outage OR security incident with data exposure.
+    SEV-2 — SLO degraded below target for ≥1% of users, or SEV-1 is imminent without action within 30 min.
+    SEV-3 — Noticeable defect, no SLO breach, workaround exists, can wait until next business day.
+    SEV-4 — Minor or cosmetic issue on low-traffic path; handle in normal sprint workflow.
+    NOTE: If two criteria conflict, declare the higher severity. Downgrade after resolution, not during.
+A3. OPEN THE WAR ROOM. Create a dedicated incident channel (#inc-YYYYMMDD-NNN), attach severity, and post SITREP-0 within 5 minutes of declaration:
+    ```
+    SITREP-0 | SEV-[N] | [Service] | [UTC timestamp]
+    IMPACT   : [Who/what is affected and estimated scope — e.g. "~12% of checkout requests returning 500"]
+    SYMPTOMS : [Observable signals only — no hypotheses here]
+    STATUS   : Investigating
+    NEXT UPDATE: T+[cadence per B6]
+    IC: @[name] | COMMS: @[name] | OPS: @[name] | SCRIBE: @[name]
+    ```
+A4. ASSIGN ROLES EXPLICITLY — named individuals, not teams. Roles may not double up on SEV-1:
+    - IC (Incident Commander): owns all decisions, paces the response, runs the bridge. Does not touch production directly.
+    - COMMS LEAD: owns all external/internal communications. IC is never the comms lead during active mitigation.
+    - OPS LEAD: owns all hands-on production actions. Must verbalize every action before execution and log it within 2 minutes.
+    - SCRIBE: owns the live timeline document. Timestamps every event in UTC. Does not take actions.
+    - SME(s): domain experts on call. Advise IC, do not act without OPS lead coordinating.
+    IC HANDOFF PROCEDURE: outgoing IC states "IC transfer to @[name] — please acknowledge." Incoming IC replies "IC transfer acknowledged, I have the IC seat." Both names and UTC timestamp are logged by SCRIBE.
+A5. NOTIFY STAKEHOLDERS in parallel with Phase B — not after mitigation starts:
+    SEV-1 → page on-call VP Eng + CTO + Customer Success lead within 5 min of declaration.
+    SEV-2 → notify Eng Manager + Customer Success within 15 min of declaration.
+    SEV-3/4 → Slack thread only; no pages.
+═══ PHASE B — CONTAIN & MITIGATE (T+5 to SLO Restored) ═══
+B1. STATE THE BLAST RADIUS BEFORE EACH ACTION. OPS lead states aloud and in the channel: "This action affects [scope — e.g. all traffic in us-east-1, ~40% of requests]. Risk of worsening: [low/medium/high] because [specific reason]. Reversible in [time estimate]." IC approves before execution.
+B2. PREFER REVERSIBLE MITIGATIONS IN ORDER. Do not skip a reversible option to reach an irreversible one without IC sign-off:
+    1. Feature flag disable or kill switch (reversible in <60s, zero data risk — highest priority).
+    2. Traffic shift to healthy region/instance (reversible in <60s, verify target has headroom first).
+    3. Rollback to last known-good deployment (reversible; check for schema delta — if migration was included, consult DBA before rolling back).
+    4. Cache flush or connection pool reset (reversible; stagger across nodes to avoid thundering herd — no more than 25% of pool at once).
+    5. Horizontal scale-out (reversible; cost impact only; confirm autoscaler isn't already saturated).
+    6. Targeted query kill or circuit breaker trip (reversible; drops in-flight work — notify dependent services before tripping).
+    7. Failover to standby DB or region (reversible with replication-lag risk — check lag before executing: if lag >30s, failover risks data loss and requires IC + DBA explicit sign-off with documented acceptable-loss window).
+    8. Data surgery or schema change (IRREVERSIBLE — requires IC + second SME sign-off, export affected rows first, test on a staging slice or shadow table before applying to production).
+B3. EXECUTE ONE MITIGATION AT A TIME unless mitigations are provably orthogonal (e.g., DNS change in one region while scaling compute in another) and IC explicitly authorizes parallel execution. Stacking simultaneous changes makes the causal chain impossible to reconstruct.
+B4. OBSERVE AFTER EACH STEP. After each mitigation, wait for at least one full metrics scrape interval (typically 60s, or two scrape intervals for slow-moving metrics like p99 latency over 5-min windows) before declaring it effective or escalating. Do not chain mitigations without observing effect.
+B5. IF MITIGATION WORSENS IMPACT: immediately revert, log the result in the timeline ("Mitigation B2.3 failed: error rate increased from 12% to 31%, reverted at [UTC]"), and resume from the next reversible option. Do not continue down the list without reverting the failed step.
+B6. SEND COMMS ON CADENCE regardless of mitigation progress:
+    SEV-1: external status page update every 30 min; internal channel update every 15 min.
+    SEV-2: external status page update every 60 min; internal channel update every 30 min.
+    Template: "Update [N] | [UTC] | Status: [Mitigating / Monitoring / Resolved] | [One factual sentence on what changed since last update — no speculation] | Next update: [UTC time]"
+B7. PRESERVE DIAGNOSTIC EVIDENCE BEFORE IRREVERSIBLE ACTIONS:
+    - Export relevant log windows to a persistent store separate from the service being repaired (S3, GCS, or similar durable sink).
+    - Snapshot metrics at the peak of incident impact and again at the moment of first mitigation.
+    - Before restarting any process: capture thread dump (`kill -3 <pid>` or jstack for JVM; `/proc/<pid>/stack` for Linux), heap dump if OOM-related, and open file descriptors (`lsof -p <pid>`).
+    - Before rolling back a DB migration: export affected table rows (`pg_dump -t affected_table --data-only`) before reverting the schema.
+═══ PHASE C — DIAGNOSE (runs in parallel after initial mitigation, not instead of it) ═══
+C1. FORM HYPOTHESES FROM EVIDENCE, not from pattern-matching to past incidents. List the top 3 hypotheses ranked by: (a) fit to the full set of observed signals, (b) correlation with the change window (see C2). Write them down — unwritten hypotheses get confused with conclusions.
+C2. CORRELATE THE CHANGE WINDOW. Pull the deployment log, feature flag audit log, infrastructure change log, and dependency changelogs for the 2 hours prior to first symptom (not first alert — alert lag is real). The majority of production incidents correlate with a change in this window. If no change is found, extend the window to 24 hours before concluding "no correlated change."
+C3. TEST HYPOTHESES WITH MINIMAL PRODUCTION FOOTPRINT. Use read-only queries, log sampling, and metrics slicing before triggering any additional writes, restarts, or traffic manipulation. Confirm a hypothesis by predicting an observable and checking it, not by applying a fix and seeing if symptoms stop — that conflates mitigation with diagnosis.
+C4. DO NOT ANCHOR on the first plausible explanation. Require that the leading hypothesis explains ALL observed signals. If it doesn't account for an anomaly, you have either the wrong hypothesis or a compound incident. Declare "compound incident" explicitly if two independent failure modes are confirmed.
+C5. DECLARE ROOT CAUSE ONLY WHEN: the hypothesis predicts the symptom pattern with specificity, a targeted fix eliminates symptoms, and at least one SME independent of the diagnoser concurs. "Root cause TBD — under investigation" is a valid postmortem state. A confident false root cause is worse than an open question.
+═══ PHASE D — RESOLVE & MONITOR (SLO Restored + 60 min) ═══
+D1. CONFIRM SLO RESTORATION with two independent signals (e.g., error rate dashboard AND synthetic probe passing) before declaring resolved. "Looks better on one graph" is not resolved.
+D2. SEND ALL-CLEAR COMMS immediately upon resolution:
+    ```
+    RESOLVED | SEV-[N] | [Service] | [UTC]
+    DURATION : [HH:MM] from first customer impact to full recovery
+    IMPACT   : [Final scope — users affected, requests failed, data affected if any]
+    MITIGATION: [One sentence on what was done to fix it]
+    ROOT CAUSE: [Confirmed: <one sentence> / Under investigation — postmortem by <DATE>]
+    ```
+D3. KEEP HEIGHTENED MONITORING for 60 minutes post-resolution. Watch error rates, latency p99, queue depths, and DB connection counts at 1-minute granularity. If the service has a daily traffic peak, the monitoring window must cover at least one full peak — extend beyond 60 minutes if the peak has not yet occurred.
+D4. DECLARE INCIDENT CLOSED only after the monitoring window passes clean. IC states "Incident closed at [UTC]" explicitly in the channel. SCRIBE appends the close event to the timeline.
+D5. ASSIGN POSTMORTEM OWNER before closing the incident. Default: OPS lead owns the draft; IC reviews and approves. Set a calendar event for the postmortem review meeting within 5 business days; postmortem is not "done" until the review meeting has action items assigned with owners and due dates.
+═══ PHASE E — POSTMORTEM HANDOFF ═══
+E1. THE LIVE TIMELINE (maintained by SCRIBE throughout) is the primary postmortem input. It must include: every alert fired (UTC), every hypothesis considered and outcome, every action taken (actor, UTC, exact command or change reference, observed result), every communication sent, and the resolution event. Gaps in the timeline are findings, not oversights — note them explicitly.
+E2. POSTMORTEM STRUCTURE (mandatory sections):
+    - Incident summary: severity, duration, customer impact in concrete measurable terms (requests failed, users affected, revenue window, data records affected).
+    - Timeline: verbatim from live SCRIBE log, annotated with "DETECTION," "MITIGATION," "RESOLUTION" markers.
+    - Root cause: specific and falsifiable — not "human error" but "engineer applied migration M-4421 without a rollback path on a table receiving 8k writes/min, causing lock escalation."
+    - Contributing factors: what made this possible, what slowed detection, what slowed response.
+    - Detection gap: time between incident onset and first alert firing, and why.
+    - Response gap: time between first alert and first effective mitigation, and why.
+    - Action items: each item requires owner (named person), due date, and ticket link. No orphan action items. No action item that says only "improve X."
+    - What went well: specific behaviors or systems that functioned as designed during the incident (e.g., "circuit breaker on payment service prevented cascade to order service").
+E3. ACTION ITEMS MUST BE SPECIFIC AND BOUNDED:
+    Acceptable: "Add SLO-aligned alert on p99 checkout latency >2s sustained for 3 consecutive 1-min intervals (owner: @alice, due: 2026-07-10, ticket: ENG-4421)."
+    Not acceptable: "Improve monitoring." / "Be more careful with migrations." / "Add more alerts."
+E4. SEVERITY RECALIBRATION: After every SEV-1, IC and Eng Manager review whether the declared severity matched actual customer impact. If systematic miscalibration is found (e.g., SEV-2 events consistently producing SEV-1 customer impact), adjust the severity matrix thresholds — not the individual incident label retroactively.
+KEY PRINCIPLE: Restore the service first, understand it second, prevent it third — in that order, every time. Diagnosis during an active outage is a distraction from the primary obligation to customers.

package/skills/interview-loop.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+name: interview-loop
+description: Design and run a structured interview loop: map each stage to a competency, write legally-safe STAR behavioral questions + real work samples per stage, build a scored rubric, and run a calibrated debrief — same questions every candidate, bias minimized.
+allowed-tools: Read, Write
+---
+> DISCLAIMER: This skill is an educational playbook for structured interviewing practice. Nothing here constitutes legal advice. Employment law varies by jurisdiction and changes over time. Consult qualified employment counsel before finalizing question banks, retention policies, or any compliance-sensitive hiring practice.
+═══ HARD RULES ═══
+1. SAME questions in SAME order for every candidate for the same role — zero ad-hoc pivots mid-interview.
+2. NEVER ask: age, national origin, religion, marital/family status, disability, citizenship status, pregnancy, arrest record, credit history, or any question whose answer reveals a protected class.
+3. Score BEFORE the debrief; written signal captured independently by each interviewer before group discussion.
+4. Work samples must replicate actual job tasks — not brain teasers, puzzles, or trivia.
+5. Every rubric anchor must describe observable behavior, not inferred traits ("explained the tradeoff in writing" not "smart").
+6. Interviewers own exactly ONE competency each; no interviewer duplicates another's focus in the same loop.
+7. A hiring decision requires at least three independent scored signals; never hire on a single interview.
+8. All notes are factual and job-relevant; notes become legal records — write accordingly.
+═══ PHASE A — DEFINE THE ROLE AND COMPETENCY MAP ═══
+A1. Extract the 4–6 competencies that differentiate top-quartile performers in this specific role from average performers. Source them from: (a) the job description, (b) output from current top performers in the same function, (c) the biggest failure modes in prior hires. Reject competencies that apply to every knowledge-worker (e.g., "communication" unqualified).
+A2. For each competency write a one-sentence definition scoped to this role. Example for a Staff Engineer competency: "Technical Scope: designs systems whose architectural decisions remain correct across a 3× scale increase without rewrite."
+A3. Assign exactly one competency to each interviewer-stage pair. Map stages to the hiring funnel: Screen → Technical / Craft → Behavioral / Values → Work Sample → Leadership/Stakeholder (add or collapse stages to match the seniority level).
+A4. Document the competency map as a table: Stage | Interviewer Role | Competency | Time Budget | Question Set ID. Freeze before scheduling candidates.
+═══ PHASE B — WRITE THE QUESTION BANK ═══
+B1. For each competency, write exactly TWO behavioral (STAR) questions. STAR format: Situation scoped to the role's domain → Task that required the competency → Action taken by the candidate specifically → Result measurable in the role's terms. Example: "Tell me about a time you had to redesign a production service mid-migration because a dependency changed scope. Walk me through what you changed and what the system looked like after." Do NOT write hypothetical "what would you do" variants as the primary question — hypotheticals allow rehearsed theory; past behavior surfaces real capability.
+B2. Write 2–3 follow-up probes per question in advance. Probes surface STAR gaps: "What was your specific contribution vs. the team's?", "What did you try that didn't work?", "What would you do differently today?", "How did you measure success?" These are fixed; interviewers use them verbatim when a STAR component is missing.
+B3. Design ONE work sample per stage (not per question). The work sample must: (a) require the same cognitive labor as real job tasks, (b) be completable in 45–90 minutes without proprietary context, (c) have an observable output (written doc, working code, annotated diagram, verbal walkthrough), (d) be identical across candidates. For a technical role: a scoped debugging or architecture exercise on a representative system in the team's stack. For a product role: a prioritization memo given a realistic constraint set. For a people-manager role: a written debrief of a fictional underperformance scenario.
+B4. For vague competency labels (e.g., "cultural fit," "adaptability"), replace the label with a behavioral anchor before writing questions: "Navigates ambiguity: candidate has shipped work without complete requirements and can reconstruct the decision process." Question it with a STAR ask, not an open-ended values-probe — values probes surface presentation skill, not job behavior, and introduce evaluator-similarity bias.
+═══ PHASE C — BUILD THE SCORING RUBRIC ═══
+C1. Use a 1–4 integer scale per competency. Define each anchor with observable evidence only:
+  1 — Strong No Hire: critical signal missing or contradicted; example behavior described.
+  2 — Lean No Hire: partial evidence; one or more STAR components absent; follow-ups did not recover signal.
+  3 — Lean Hire: clear evidence of competency at the expected level for the role's scope and seniority.
+  4 — Strong Hire: evidence exceeds role expectations; candidate operated one level above in at least one concrete example.
+C2. Write one "calibration anchor" per score level per competency — a short description of what a real answer at that level sounds like. These are shared with interviewers before the first candidate. Do NOT share anchors with candidates.
+C3. Add a "Red Flag" field separate from the score. Red flags are disqualifying signals that override aggregate score: fabrication, inability to distinguish own contribution from team output, dismissal of past mistakes as others' fault, ethical breach in a past work sample. Any single red flag triggers a required debrief before a hire decision.
+C4. Aggregate formula: sum scores across competencies weighted by importance to the role (assign weights during Phase A). Threshold for Hire recommendation: define before the first interview, based on the role's bar — do not set it post-hoc.
+═══ PHASE D — RUN THE INTERVIEWS ═══
+D1. Brief every interviewer on: their assigned competency only, the question set and follow-up probes, the rubric anchors, the legal no-go list, and the expectation that scoring is independent and written before debrief. Briefing must happen before the first candidate; do not brief during the loop.
+D2. Each interviewer opens with a fixed 60-second role/stage framing so every candidate hears the same context. No biographical warmup questions that could surface protected-class information.
+D3. Interviewers ask their two STAR questions and the work sample, in that order. Time allocation: ~15 min per STAR question + follow-ups, ~30 min for work sample walkthrough, ~5 min for candidate questions. Total: 60–75 min.
+D4. Interviewers take factual notes in real time: what the candidate said, not what the interviewer concluded. "Candidate stated they reduced latency from 800 ms to 120 ms by replacing synchronous DB calls with a read-through cache" is a note. "Candidate seemed sharp" is not.
+D5. Immediately after the interview, each interviewer completes the rubric independently: score (1–4) + written evidence sentence per competency + red flag field. Submission is timestamped. No discussion before submission.
+═══ PHASE E — DEBRIEF PROTOCOL ═══
+E1. The debrief chair (hiring manager or designated DRI) collects all written rubric submissions before convening. Aggregate scores are visible to all; individual write-ups are read aloud in stage order, not simultaneously.
+E2. Debrief order: Screen → Technical → Behavioral → Work Sample → Leadership. Each interviewer presents their evidence first, score second. Chair enforces the order; no one reveals their score before presenting evidence.
+E3. If a score is contested (difference of 2+ points between interviewers for the same competency), chair asks each interviewer to cite a specific candidate behavior — not an inference. If the disagreement persists after evidence exchange, the lower score is used (conservative default reduces false positives at hiring bar).
+E4. Red flags are reviewed last. Any red flag requires explicit group acknowledgment and a written disposition (e.g., "candidate could not separate their contribution from team output across two STAR answers; No Hire on Ownership competency"). Red flags cannot be overridden by high scores on other competencies without documented rationale from the hiring manager.
+E5. Final recommendation categories: Strong Hire / Hire / No Hire / Strong No Hire. Decisions that split (e.g., 2 Hire + 1 No Hire) require the hiring manager to make a documented tie-break with written rationale referencing specific evidence, not gut feel. Document the decision, the dissenting signal, and why it was resolved the way it was — this record protects the organization and improves future calibration.
+E6. After each hire/no-hire decision, score the rubric retrospectively at 6 months using performance review data. Compute interviewer predictive accuracy per competency. Retire questions whose scores show no correlation with 6-month outcomes; replace with new questions targeting the same competency.
+═══ PHASE F — LEGAL HYGIENE AND CONSISTENCY AUDIT ═══
+F1. Before each new loop begins, run the question bank through a legal-safety check: any question whose answer could reveal age, national origin, religion, marital status, disability, pregnancy, citizenship, or financial history must be rewritten or cut. If uncertain, use this test: "Does this question require the answer to reveal a protected characteristic in order to be meaningful?" If yes, cut it.
+F2. Consistency audit: confirm every candidate for the same role received the same questions in the same order. If a candidate received a different question set (scheduling error, interviewer substitution), document the deviation and account for it in scoring.
+F3. Retain all rubric submissions, notes, and debrief records for the period required by applicable law in your jurisdiction — retention minimums vary by country, state, and claim type (e.g., EEOC charge filing windows differ from state-law windows). Consult employment counsel to establish your retention schedule before destroying any record. Destroying interview records prematurely is a legal liability regardless of the hire outcome.
+F4. If the candidate pool for a role is homogeneous across multiple loops, audit the sourcing pipeline — structured interviews reduce bias at evaluation but cannot correct for biased sourcing.
+KEY PRINCIPLE: Structured interviews predict job performance only when every candidate faces the same questions, every score is anchored to observable behavior, and every debrief separates evidence from inference before revealing the vote.

package/skills/job-scorecard.md ADDED Viewed

@@ -0,0 +1,92 @@
+---
+name: job-scorecard
+description: Build a role scorecard (mission, 4-6 measurable 12-month outcomes, ranked competencies with behavioral anchors, must-have vs nice-to-have) before drafting any job description — triggered whenever a user asks to define, hire for, or evaluate a role.
+allowed-tools: Read, Write
+---
+═══ HARD RULES ═══
+1. OUTCOMES BEFORE SKILLS — never list a competency until every 12-month outcome is written, numbered, and measurable. A scorecard is a success contract, not a trait inventory.
+2. NO WISH-LIST COMPETENCIES — every competency must be traceable to at least one outcome by number. Delete any competency that can't name its parent outcome.
+3. BEHAVIORAL ANCHORS ARE PAST-TENSE EVIDENCE — anchors describe what a candidate demonstrably did, not what they "would" do or "believe in." Every anchor starts with a past-tense verb: "Built…", "Reduced…", "Led…", "Shipped…", "Closed…".
+4. MEASURABLE MEANS AUDITABLE — every outcome must state the metric, the baseline or direction, and the time window. "Improve NPS" fails; "Increase NPS from 32 to 48 by Q4" passes. If no baseline exists, write "establish baseline by [date], then target [direction]."
+5. MUST-HAVE LINE IS A FILTER, NOT A PREFERENCE — a must-have disqualifies a candidate who lacks it immediately. If you'd waive it for a strong candidate, it is a nice-to-have. Move it.
+6. NO FABRICATED BENCHMARKS — never invent market salary figures, attrition rates, ramp timelines, or industry comparison numbers as if factual. Label any estimate explicitly as an estimate.
+7. ONE MISSION SENTENCE — the mission is a single sentence: what the role exists to produce, for whom, measured how. Not a paragraph.
+8. SCORECARD SHIPS BEFORE THE JD — the scorecard is the source of truth; the JD is derived from it. Never reverse this order. Every JD bullet must map to a scorecard element or it gets cut.
+═══ PHASE A — ROLE ARCHAEOLOGY ═══
+A1. Ask the hiring manager (or extract from context) exactly three things:
+    (a) What broke, stalled, or is missing that created this opening?
+    (b) What does "great" look like in 12 months — name a specific result, not a behavior?
+    (c) Who is this person's most important internal customer and what do they need from this role that they aren't getting today?
+A2. Identify the role's single non-negotiable constraint. Choose exactly one from: budget/P&L ownership, deep technical execution, people management, regulatory/compliance exposure, external-facing revenue accountability. This constraint is the spine of the must-have list.
+A3. Map the organizational layer and adjust expectations accordingly:
+    — IC: outcomes are personal outputs (lines of code shipped, deals closed, content published). Competency weight: execution and craft.
+    — Lead/Staff: outcomes are throughput (team velocity, quality bar lifted, unblocking others). Competency weight: judgment and force-multiplier behaviors.
+    — Manager of managers: outcomes are leverage (org capacity built, succession depth, cross-functional coverage). Competency weight: talent development and org design.
+    — Executive/VP+: outcomes are strategic position (market share, capability unlocked, board-level commitments met). Competency weight: direction-setting and external credibility.
+A4. Flag "phantom requirements" — criteria added because of the last person who failed, not because they predict success. Name each one explicitly. For each, ask: "Would a candidate lacking this fail the role's outcomes, or only the manager's comfort?" If the latter, delete it.
+A5. Identify the "time horizon mismatch" risk: is the hiring manager asking for 12-month outcomes that actually require 24 months in this org? If yes, surface it before writing outcomes or you'll build an unfillable scorecard.
+═══ PHASE B — WRITE THE MISSION ═══
+B1. Draft one sentence using this skeleton: "[Role title] exists to [primary deliverable] for [primary customer] so that [business outcome]."
+    Example (engineering): "Staff Engineer, Payments exists to own the reliability and throughput of the checkout flow for the growth team so that payment failure rate stays below 0.3% as transaction volume scales to $1B ARR."
+    Example (go-to-market): "Enterprise AE exists to close net-new logos in the $100K–$500K ACV band for the mid-market segment so that ARR from enterprise reaches $8M by end of year."
+B2. Test the mission: strip the role title. Does the sentence still uniquely describe this role versus three adjacent roles? If not, tighten the deliverable — usually by naming the customer more precisely or the metric more specifically.
+B3. Confirm the mission survives 2-3 business pivots. If the company changes direction, does this mission hold for at least 18 months? If not, you are describing a project, not a role — either scope it as a fixed-term engagement or rethink the opening.
+═══ PHASE C — DRAFT 12-MONTH OUTCOMES ═══
+C1. Write 4-6 outcomes only. Fewer than 4 under-defines the role; more than 6 signals the scope is unclear or two roles are being collapsed into one.
+C2. Format every outcome as: "By [month or quarter], [measurable result] achieved, evidenced by [specific artifact or metric]."
+    Weak: "Build a strong data pipeline." Strong: "By Q3, production data pipeline handles 10M events/day with p99 latency under 200ms, evidenced by Datadog SLO dashboard."
+C3. Stack-rank outcomes 1-6 by business impact. Apply the "only outcome" test: if the new hire achieves only outcome #1 and nothing else, was the hire worth making? If no, #1 is ranked wrong.
+C4. For each outcome, write the 90-day leading indicator — the earliest measurable signal that the hire is on track. This becomes the 90-day check-in contract:
+    — For IC roles: a shipped artifact, a working prototype, or a closed deal.
+    — For people-manager roles: team pulse score above baseline, one direct report promoted or retained through a competitive outside offer.
+    — For executive roles: a strategy document ratified by the executive team, one external partnership signed.
+C5. Pressure-test attribution: would a strong external candidate actually control this outcome, or is it downstream of organizational decisions beyond their reach? If the latter, reframe the outcome to the input they own, not the output they hope to influence.
+═══ PHASE D — COMPETENCY RANKING ═══
+D1. List raw competencies implied by the outcomes. One pass, no editing. Every competency must cite its parent outcome by number (e.g., "Systems design — drives outcomes 1, 3").
+D2. Trim to 5-7 competencies maximum. Merge overlaps ruthlessly:
+    — "Communication" + "stakeholder management" = one competency at most levels (name it "Influence without authority" for ICs, "Executive communication" for managers).
+    — "Analytical thinking" + "data-driven decision making" = one competency (name it by what it actually requires: "SQL proficiency and A/B interpretation" for an analyst, "financial modeling and scenario planning" for a GM).
+D3. Rank competencies 1 (most critical to outcomes) to N. The top 2 competencies should account for roughly 60% of what separates a great hire from an adequate one in this role. If you can't identify which 2 those are, return to Phase A — the role's core is unclear.
+D4. Write behavioral anchors per competency — exactly the levels needed, no more:
+    — Anchor 1 (minimum bar): the floor. A candidate below this level is a NO regardless of other strengths. Example for "Pipeline generation" at an AE role: "Built outbound pipeline from scratch in a market with no brand recognition, generating at least $500K in qualified opportunities in a 12-month period without SDR support."
+    — Anchor 2 (solid hire): this candidate succeeds and stays. Represents the hiring target for most openings.
+    — Anchor 3 (exceptional): rare. Define it, but do not screen for it by default or the role will go unfilled. Reserve for replacement hires at senior levels.
+    — Anchor 4 (company-builder, optional): add only if the role has an explicit org-building mandate — typically Director+ with headcount to grow. "Built and hired a team of 8+ SDRs from 0, including the hiring process, comp structure, and ramp playbook."
+D5. Label each anchor with its level. During debrief, interviewers assign anchors (not numeric scores) to prevent false precision. "This candidate is a 2 on pipeline generation" is more calibrated and defensible than "7 out of 10."
+D6. For manager-level and above, include one mandatory competency: talent development. Anchor 1: "Has retained and promoted at least one direct report during their tenure." Anchor 2: "Can name the career plan and next promotion criteria for each direct report without looking at notes." Anchor 3: "Has developed a successor who was promoted into their role or above."
+═══ PHASE E — MUST-HAVE VS NICE-TO-HAVE LINE ═══
+E1. Must-haves: hard-filter criteria. The candidate is screened out immediately if missing. Limit to 3-5. Valid categories:
+    — A non-negotiable credential, license, clearance, or language proficiency that cannot be acquired on the job (e.g., active TS/SCI clearance, bar admission in the relevant jurisdiction, native-level Mandarin for a role that requires negotiating in Mandarin).
+    — Domain depth that takes longer than 6 months to acquire to minimum useful proficiency (e.g., "has shipped payments infrastructure processing more than $100M/year in production").
+    — Organizational scope — the candidate has operated at this level before. You cannot safely parachute someone two levels above their highest prior scope (e.g., "has directly managed a team of 10+ engineers" for a Director of Engineering role).
+E2. Nice-to-haves: genuine ramp accelerators. Would meaningfully reduce time-to-productivity or raise the ceiling but are not disqualifying. List 3-5 maximum. A list longer than 5 means must-haves were misclassified.
+E3. Write the "We Will NOT Require" list — at least 2 items that appear in similar job postings but have been deliberately excluded. Include one line of rationale per item:
+    — Example: "Specific degree credential — we evaluate demonstrated output, not credential; our current team includes members without a CS degree who outperform degreed peers on every metric."
+    — Example: "10+ years experience — this scorecards for outcomes achievable in fewer years by a high-performer; we'll use the behavioral anchors, not a tenure proxy."
+    This discipline prevents credential inflation, broadens the candidate pool, and forces explicit decisions rather than defaults.
+E4. Audit the must-have list for proxy discrimination. For each must-have, apply this test: "Could this criterion disproportionately filter out candidates of a particular age, gender, or national origin without being genuinely predictive of job performance?" Common offenders:
+    — "Culture fit" without behavioral definition: delete it or define it as a specific, observable competency.
+    — "X years of experience" as a proxy for seniority: replace with a behavioral anchor at the target level.
+    — Specific school or pedigree: unless accreditation is legally required, replace with a demonstrated-output criterion.
+    If a must-have fails this test but remains genuinely job-relevant, document the job-relatedness rationale explicitly before keeping it.
+═══ PHASE F — CALIBRATION & OUTPUT ═══
+F1. Run the bar-raiser test: hand the draft scorecard to a senior person outside the hiring team. Can they independently assign an anchor level to a hypothetical candidate using only the behavioral anchors, without asking you any clarifying questions? If they ask questions, those questions identify exactly which anchors are too vague — rewrite those anchors before proceeding.
+F2. Sequence the final scorecard output in this exact order:
+    1. Role Mission (1 sentence)
+    2. 12-Month Outcomes (numbered, stack-ranked, with artifact/metric callouts)
+    3. 90-Day Leading Indicators (paired to each outcome)
+    4. Competencies (ranked 1-N, with behavioral anchors labeled by level)
+    5. Must-Have Line (3-5 items with rationale)
+    6. Nice-to-Have List (3-5 items)
+    7. "We Will NOT Require" List (at least 2 items with rationale)
+F3. Derive the JD only after the scorecard is approved by the hiring manager. Map every JD bullet to a scorecard element. Any JD bullet with no scorecard parent is deleted. This prevents the JD from adding phantom requirements after the fact.
+F4. Designate the scorecard as the canonical debrief instrument. In the hiring debrief, every participant scores each competency by anchor level (1-4), not by overall impression. Offer decisions reference the scorecard, not the debrief consensus. Post-hire 90-day reviews reference the leading indicators in C4. The JD is ephemeral; the scorecard persists through onboarding.
+KEY PRINCIPLE: A scorecard defines what winning looks like before the search begins — if a criterion cannot be evaluated at the offer stage using observable evidence, delete it.

package/skills/kb-article.md ADDED Viewed

@@ -0,0 +1,174 @@
+---
+name: kb-article
+description: Write task-focused knowledge base articles — shape the title around one problem, structure in numbered steps with concrete callouts/screenshots, branch for failures, and close with related links. Skip corporate filler; ship text a practitioner skims in 3 minutes and trusts.
+allowed-tools: Read, Write
+---
+## HARD RULES (read before every phase — NEVER VIOLATE)
+- **Title = a problem statement.** Not "API authentication" — "Verify a malformed API key before it hits the database."
+- **One problem per article.** "Deploy and monitor your app" = two articles. Split it, link them.
+- **No fabricated output.** Every code block showing output must have been observed. If you reconstructed it, label it: `# Reconstructed from memory — verify before publishing`.
+- **Every command must be copy-paste ready.** If the reader must substitute a value, make it obvious: `kubectl scale deployment $APP_NAME --replicas=$NEW_COUNT -n $NAMESPACE`.
+- **No "best practices," no hedging, no philosophy.** "You may want to consider" → delete. "Do this" → keep. Readers are in crisis mode.
+- **Never nest decisions more than two levels deep.** If/then/else/else-if = three levels. Anything deeper gets its own branching article linked from this one.
+- **Time estimates must be honest.** "5 minutes" means the reader completes it in 5–8 minutes. If you under-estimate, they lose trust permanently.
+- **Diagrams beat prose for state machines, flow branches, and multi-system interactions.** Use an ASCII diagram or describe the image in exact alt text. Never say "see diagram" without showing it.
+---
+## STANCE RULE (non-negotiable)
+You are writing for someone with a production problem and 30 seconds of patience. One idea per paragraph. One action per numbered step. If you cannot show the result, do not mention the step. Target: skimmable in under 3 minutes; executable alone at 11 PM with no context beyond what is written here.
+---
+## PHASE 1 — TITLE & SCOPE (5 minutes max)
+1. **Write the title as a problem statement with a time bound.** Examples:
+   - WRONG: "Database Connections"
+   - RIGHT: "Recover from a Postgres connection pool exhausted in under 10 minutes"
+   - WRONG: "Kubernetes Deployments"
+   - RIGHT: "Roll back a failed Kubernetes deployment without downtime"
+2. **Write a single scope sentence.** State what the article covers and what it does NOT cover:
+   ```
+   This article shows how to reset an exhausted Postgres connection pool on a live production
+   instance. It does NOT cover performance tuning, PgBouncer setup, or connection string syntax.
+   ```
+3. **State the target audience explicitly in one clause.** This is the second line under the title, not a section header:
+   ```
+   **Audience:** Backend SRE with SSH access to production | **Severity:** SEV2
+   ```
+4. **State the time estimate at the top of the article.** "10–15 minutes." If you do not know, run through the steps yourself and time it. Do not guess.
+---
+## PHASE 2 — PREREQUISITES (one scannable block)
+5. **List exactly three to five prerequisites as a flat bulleted list.** Each bullet = one thing the reader needs before step 1. Format:
+   ```
+   • Terminal access to the production Postgres host via `ssh prod-db-01`
+   • `psql` CLI version ≥13 (`psql --version` to check)
+   • Current `max_connections` value: run `SELECT current_setting('max_connections');` and note the number
+   ```
+   Never nest bullets. Never list things that are obviously implied (do not list "an internet connection"). If the reader needs to understand a prior concept, link to the article that explains it — do not re-explain it here.
+6. **Write a one-sentence problem statement under the prerequisite block.** Make it visceral and specific — the reader should recognize their symptom immediately or stop reading:
+   ```
+   **Problem:** The web app returns `HTTP 503 Service Unavailable` and your database logs show `FATAL: too many connections`.
+   ```
+---
+## PHASE 3 — NUMBERED STEPS (the procedure)
+7. **Number every step starting at 1.** Each step = exactly one discrete action the reader executes.
+8. **One action per step. If a step uses "then" or "and," split it into two steps:**
+   ```
+   WRONG:  3. Scale the deployment and verify it started.
+   RIGHT:  3. Scale the deployment:
+              kubectl scale deployment web --replicas=5 -n prod
+           4. Wait 15 seconds for new pods to schedule.
+           5. Confirm the rollout completed:
+              kubectl rollout status deployment/web -n prod
+   ```
+9. **Use triple-backtick code blocks for every command, config snippet, or output.** Inline backtick only for variable names, filenames, error codes, and values within prose:
+   ```
+   3. Check the number of active connections:
+      ```
+      psql $DB_URL -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
+      ```
+      If `active` is at or above your `max_connections` value (from prerequisites), proceed to step 4.
+   ```
+10. **Show the expected output for every step that produces one.** Not "observe the result" — show the actual result:
+    ```
+    2. Verify the replica is healthy and read-only:
+       ```
+       $ psql prod-db-02 -U postgres -c "SELECT version();"
+        version
+       ---------
+        PostgreSQL 15.3 on x86_64-pc-linux-gnu ...
+       (1 row)
+       ```
+       → Replica is responding. Proceed to step 3.
+    ```
+    Label reconstructed output explicitly: `# Reconstructed — verify on your system`.
+11. **Add one callout box for any step that is irreversible, destructive, or counterintuitive.** Use `> **CALLOUT:**` markdown. One per step maximum — more dilutes urgency:
+    ```
+    > **CALLOUT:** `pg_promote()` is a one-way operation. After promotion, the old primary
+    > cannot rejoin until manually re-initialized as a replica. Allow 15 minutes to recover
+    > the former primary. Do not proceed unless the primary is confirmed unreachable.
+    ```
+12. **For UI-only steps (dashboards, forms, admin panels), describe the visual precisely:** "Click the red **Restart** button in the top-right corner of the Deployment detail page — not the blue **Redeploy** button below it. A confirmation modal appears; type the deployment name to confirm."
+---
+## PHASE 4 — FAILURE BRANCHING ("If this did not work")
+13. **After the happy path, add an "If this did not work" section.** This is where most readers land. Cover the three most common failure modes — not ten. Format each as a Symptom / Diagnosis / Next action triple:
+    ```
+    ## If this did not work
+    **Symptom:** The `active` connection count is still at max after step 4.
+    → **Diagnosis:** Running pods still hold the old pool size. The config change was not reloaded.
+    → **Next action:** Force a pod restart: `kubectl delete pod -l app=web -n prod`. New pods load
+      the updated config in ~20 seconds. Re-run step 4 to confirm.
+    **Symptom:** After the pod restart, error rate jumped to 100%.
+    → **Diagnosis:** All pods restarted simultaneously; slow boot loop caused every replica to be
+      down at once.
+    → **Escalation:** Page @sre-oncall in Slack. This is now a SEV1. Do not retry the pod delete.
+    ```
+14. **List symptoms in descending probability** — most likely failure first. If a symptom requires escalation rather than a self-fix, say so explicitly: "This is beyond the scope of this runbook. Page @team."
+---
+## PHASE 5 — RELATED LINKS & ESCALATION
+15. **End with a "Related" section** when the article links to deeper material or alternate paths. Format: one line per link, title self-explanatory, three to five links maximum. If you list more than five, you have not prioritized:
+    ```
+    ## Related
+    • [How to read Postgres query logs](#) — diagnose slow queries, not just connection exhaustion
+    • [PgBouncer setup for persistent connection pooling](#) — long-term fix; do this after the incident
+    • [SEV1 incident runbook](#) — if error rate is at 100% and escalation is needed
+    • Escalation: Page @db-team in Slack (24/7) for replication corruption or split-brain scenarios
+    ```
+16. **Escalation contacts must be specific.** Not "contact support." Either: "Page @sre-oncall in Slack" or "Call the infrastructure team at +1-555-0199 (24/7 for SEV1 only)." If neither is available, state the owner of this document and their response SLA.
+---
+## PHASE 6 — FINAL PASS (before publishing)
+Run through this checklist in order. Each item must be true before the article ships:
+- [ ] Title is a problem statement with a time bound, not a topic.
+- [ ] Scope sentence says what the article covers and what it does NOT.
+- [ ] Target audience and time estimate are in the first three lines.
+- [ ] Prerequisites: three to five concrete items, flat list, no implied knowledge.
+- [ ] Problem statement one line, visceral, reader recognizes their symptom immediately.
+- [ ] Every step numbered; every step is one action; no "and" or "then" within a single step.
+- [ ] Every command in a triple-backtick block, copy-paste ready with `$VAR` substitution markers.
+- [ ] Expected output shown (not described) for every step that produces output; reconstructed output labeled.
+- [ ] At least one callout box for a destructive or irreversible step.
+- [ ] "If this did not work" covers exactly three failure modes in Symptom / Diagnosis / Next action format.
+- [ ] Related links: three to five items; escalation contact is a named person or Slack handle, not "support."
+- [ ] Zero phrases: "best practice," "you may want to," "it is recommended," "simply," "just," "ensure that."
+- [ ] Every sentence is unambiguous when read at 11 PM under stress.
+---
+## KEY PRINCIPLE
+A KB article is a troubleshooter's checklist, not a tutorial. The reader is under time pressure and cannot tolerate ambiguity. Every word must describe a specific action or a specific result. If a sentence does not tell the reader what to do, what to type, or what to look for, delete it.

package/skills/launch-plan.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: launch-plan
+description: Build a full GTM launch plan for a software product or feature: segment audience, structure alpha/beta/GA tiers, craft the core message + proof points, assign channel owners + metrics, produce an asset checklist, set a timeline, and generate a launch-day runbook — invoke when asked to plan a launch, go-to-market, release strategy, or channel plan.
+allowed-tools: Read, Write
+---
+═══ HARD RULES ═══
+1. Every deliverable must be specific to THIS product — no generic "improve engagement" language; name the feature, persona, and measurable outcome.
+2. Every channel entry must have exactly one human/role owner and one primary metric. Split ownership = no ownership. If you cannot identify the owner from files you read, write OWNER = [TBD — assign before beta opens] and flag it explicitly.
+3. Never state a benchmark figure (conversion rate, CAC, LTV) as fact unless sourced from the actual product's analytics or a file you read this session.
+4. Gate advancement between tiers (alpha → beta → GA) on explicit, pre-agreed numeric criteria — not on elapsed time alone.
+5. The core message must survive the "so what?" test: state the specific user problem, the specific mechanism that solves it, and what proof exists today.
+6. Asset checklist items must be marked OWNER and DUE-DATE; unowned assets do not ship.
+7. Launch-day runbook steps must be sequenced with a responsible role, an explicit dependency on the prior step passing, and a rollback trigger — not a bullet list of hopes.
+8. Do not invent competitive positioning claims or user-research findings. Use only what exists in the codebase, docs, or files you read this session.
+═══ OUTPUT FORMAT ═══
+Produce a single written document — save it as LAUNCH_PLAN.md in the project root unless the user specifies otherwise. Structure it with one section per phase (A through G) plus a DECISIONS LOG at the end where you record every judgment call you made and the evidence behind it. Do not produce a conversational summary in place of the document.
+═══ PHASE A — AUDIENCE DEFINITION ═══
+A1. Read product docs, claims.yaml (if present), and any existing user-research or analytics files before writing a single word of audience copy. Extract: who has the problem today, what workflow breaks without this feature, and what the closest competing behavior is. Do not infer from the product name alone.
+A2. Define exactly two or three segments with a one-sentence job-to-be-done per segment. Reject segments that overlap by more than 50% in behavior or buying motion.
+A3. For each segment identify: acquisition channel (where they already gather — name the specific community, not "social media"), the trust signal they require before converting, and the single friction that kills conversion today.
+A4. Rank segments by (revenue potential × probability of early adoption) / cost to reach. Pick ONE primary segment for alpha — only that segment's pain shapes the launch message. Document your ranking rationale in the DECISIONS LOG.
+A5. Document anti-persona (who you are explicitly NOT building for in this launch cycle) to prevent scope creep in messaging and support load. One sentence per excluded persona; one sentence on why they are excluded now rather than later.
+═══ PHASE B — TIERED LAUNCH STRUCTURE ═══
+B1. ALPHA (closed, ≤ 25 hand-picked users): goal is signal quality, not scale. Criteria to enter: user has the primary job-to-be-done and will commit to a 30-min debrief. Criteria to exit alpha: ≥ 70% of alpha users complete the core workflow end-to-end without support intervention, AND at least one provable proof point (verbatim quote, measured metric, verifiable receipt) is captured per segment. If 70% is wrong for this product category based on what you read, override it, state the new number, and log your reasoning.
+B2. BETA (invite-only, ≤ 500 users, waitlist-gated): goal is repeatability and funnel instrumentation. Define in the plan: (a) which onboarding steps are fully self-serve, (b) the exact activation event (the moment value is realized — name the UI action or API call), and (c) the D7 or W2 return rate that must be hit before GA. Set the activation rate target and support ticket rate ceiling by reading the product category benchmarks in the docs — if no benchmarks exist in the files, write [OWNER to set before beta opens] and flag it. Criteria to exit beta: activation rate ≥ [target you set], support ticket rate < [ceiling you set], and at least one paying customer exists.
+B3. GA (public, full acquisition channels open): goal is scalable, repeatable growth. GA gate: all beta exit criteria met, pricing page live, billing integration tested end-to-end with a real card on the production system, support runbook staffed, and rollback path documented and exercised.
+B4. For each tier write a one-paragraph "what we learn and what we decide" statement BEFORE opening that tier. If you cannot write it, the tier has no purpose and should be collapsed.
+═══ PHASE C — CORE MESSAGE + PROOF ═══
+C1. Draft the single position sentence: [Persona] who [job-to-be-done] can now [specific capability] — unlike [alternative] which [specific gap] — because [unique mechanism], and they can verify it by [concrete proof artifact].
+C2. Strip every claim that is not backed by a shipped feature or a file marked as verified (e.g., in claims.yaml). Mark unverified claims as ROADMAP-ONLY and exclude them from all launch copy, email sequences, and the landing page.
+C3. Identify the proof asset that most compresses the trust gap for the primary segment: working demo, signed receipt, benchmark result, or live customer outcome. If none exists, build it before beta opens — list it as a blocking asset in Phase E.
+C4. Write three message variants for: (a) technical evaluator, (b) economic buyer, (c) end user. Each variant uses the same mechanism but leads with a different consequence. Test all three in alpha interviews before committing to one for GA copy.
+C5. Define the "honest bound": the exact conditions under which the product does NOT work. State this in the beta welcome email and the docs. Omitting it erodes trust faster than any positioning mistake.
+═══ PHASE D — CHANNEL PLAN ═══
+For each channel below, fill: OWNER / PRIMARY METRIC / SECONDARY METRIC / ALPHA BUDGET / GA BUDGET. Determine OWNER by reading team/org files in the project; if no org structure is documented, write the role title and flag it for assignment. Do not list a channel unless you have evidence the primary segment uses it.
+D1. Owned content (blog, changelog, in-app tooltips): PRIMARY METRIC = organic search impressions to the launch page within 30 days; cost = editorial time only. Tactic: one SEO-anchored deep post timed to beta launch (topic derived from the primary segment's search behavior, not assumed) + one changelog entry per shipped increment.
+D2. Community + developer forums (name the specific communities — the subreddit, Discord server, Slack workspace, or forum where the primary segment already spends time; do not write "relevant communities"): PRIMARY METRIC = link-clicks to the product from those named communities; do not post until the alpha proof point is in hand.
+D3. Email (existing list, beta waitlist): PRIMARY METRIC = click-to-activation rate, not open rate. Send sequence: (1) waitlist confirm, (2) beta access + one proof point, (3) first-value nudge at D+3, (4) GA announcement. Write subject lines in the plan — do not leave them as placeholders.
+D4. Paid acquisition: PRIMARY METRIC = cost per activated user, not cost per click. Only open after beta activation rate is measured and meets target. Channel selection: derive from where the primary segment spends time (read the audience data); do not default to Google/Meta without evidence they are the right channel for this segment.
+D5. Partner / integration co-marketing: only include if a named partner with a confirmed agreement exists in the files you read. PRIMARY METRIC = referred signups that activate. Do not list speculative partners.
+D6. PR / press: only include if a genuine news hook exists — funding, a record benchmark, or a first-of-kind capability verified in claims.yaml. PRIMARY METRIC = tier-1 coverage pieces that drive direct traffic. Embargo to GA date.
+═══ PHASE E — ASSET CHECKLIST ═══
+E1. Landing page: OWNER / DUE: beta-5 days. Must include: position sentence, proof asset (demo or receipt), pricing, one CTA with no ambiguity, and an honest-bound disclosure.
+E2. Onboarding flow (first-run experience through activation event): OWNER / DUE: alpha exit. Must be testable end-to-end without a support person present. List the exact steps in the plan.
+E3. Proof asset (demo recording, benchmark doc, or signed receipt): OWNER / DUE: alpha exit. Must be linkable from every channel. If it does not exist at plan-writing time, mark it as a BLOCKING DEPENDENCY on beta opening.
+E4. Pricing page with billing integration: OWNER / DUE: beta exit. Tested with a real card against the production billing system — not a sandbox.
+E5. Docs / help center (minimum: getting started, core workflow, known limitations): OWNER / DUE: beta-3 days.
+E6. Email sequences with written subject lines (waitlist confirm, beta access, first-value nudge, GA blast): OWNER / DUE: beta open.
+E7. Social announcement drafts (one per named channel from Phase D, each with proof asset embedded): OWNER / DUE: GA-2 days, held in draft for GA-day send.
+E8. Support runbook (top 5 anticipated questions derived from alpha debrief themes, escalation path, rollback trigger): OWNER / DUE: beta open.
+E9. Internal launch brief (one page: what launched, who it's for, what to say, what NOT to say, who handles press inquiries): OWNER / DUE: GA-1 day, distributed to all customer-facing staff before the GA flag flips.
+═══ PHASE F — TIMELINE ═══
+F1. Anchor the timeline to the GA date and work backward. Do not set the GA date before beta exit criteria are defined — if the criteria are not yet set, write TBD and block GA dating on that decision.
+F2. Canonical scaffold (adjust durations to actual team size and complexity — document any compression rationale in the DECISIONS LOG):
+    T-8w: Audience definition locked, anti-persona documented, alpha cohort list finalized.
+    T-6w: Core message drafted, proof asset work begun, alpha environment ready.
+    T-4w: Alpha opens. Weekly debrief cadence. Message variants tested.
+    T-3w: Alpha exit criteria evaluated. If not met, extend alpha — do not advance on schedule alone.
+    T-2w: Beta opens. Funnel instrumentation live. Email sequences active. Paid held.
+    T-1w: Beta exit criteria evaluated. Pricing + billing tested on production. All assets at 100%.
+    T-0: GA. Launch-day runbook executed. Paid channels open only after T+1h signal check passes.
+    T+1w: Post-launch retro: what drove activation, what killed it, channel ROI ranked by activated users.
+F3. Every milestone has a named owner and a binary pass/fail gate — not a percentage-complete estimate.
+═══ PHASE G — LAUNCH-DAY RUNBOOK ═══
+G1. T-24h: Final go/no-go call. Checklist: billing end-to-end passes on production, landing page live on the production domain, support runbook distributed to all customer-facing staff, social drafts approved and loaded, rollback script tested by the eng lead. Decision rule: if any checklist item is red, push GA by 48h — do not launch with known gaps.
+G2. T-0:00 (launch moment): flip the GA flag or publish the public URL. ROLE: [eng lead]. Record the exact timestamp in the launch log.
+G3. T+0:15: Verify the first external signup completes the activation event without any support intervention. ROLE: [product lead]. Dependency: must pass before G4 or G5 executes. Rollback trigger: zero activations in first 15 min → hold all outbound sends, open incident channel, investigate before proceeding.
+G4. T+0:30 (only after G3 passes): Send GA email blast to full list. ROLE: [growth lead].
+G5. T+0:30 (only after G3 passes, simultaneous with G4): Publish social posts across all named channels. ROLE: [content lead]. Pin or sticky the primary post.
+G6. T+1h: First metrics read — signups, activation rate, billing success rate, application error rate, support ticket volume. ROLE: [growth lead + eng lead together]. Decision rule: if error rate exceeds [pre-set threshold from beta baseline] or billing failures > 2%, pause all paid spend and open incident channel before proceeding.
+G7. T+4h: Second metrics read. Make one of three explicit decisions: (a) open paid channels as planned, (b) hold paid pending a specific named fix, or (c) partial rollback. Document the decision with rationale in the launch log. No "let's watch a bit longer" — choose a path.
+G8. T+24h: End-of-day wrap. Capture: total signups, activation rate, conversion to paid, top support issues by volume, channel performance ranked by activated users (not signups). Send internal brief to all stakeholders.
+G9. Rollback protocol: if a critical issue (data loss, billing error, security vulnerability) is confirmed at any point, [eng lead] disables the GA flag within 15 min, [growth lead] sends a "brief maintenance" notice to any users already onboarded, and the incident postmortem is opened before any re-launch attempt.
+KEY PRINCIPLE: A launch plan is a sequence of bets with explicit exit criteria — never advance a tier on schedule alone; advance only when the evidence from the current tier tells you the next tier will work. Every decision you make while running this skill goes in the DECISIONS LOG with the evidence behind it so the plan is auditable after the fact.