@vextlabs/theron-cli 0.2.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/api.d.ts +8 -0
- package/dist/api.js +3 -0
- package/dist/api.js.map +1 -1
- package/dist/auth.js +51 -1
- package/dist/auth.js.map +1 -1
- package/dist/banner.js +3 -2
- package/dist/banner.js.map +1 -1
- package/dist/checkpoints.d.ts +32 -0
- package/dist/checkpoints.js +61 -0
- package/dist/checkpoints.js.map +1 -0
- package/dist/index.js +61 -5
- package/dist/index.js.map +1 -1
- package/dist/input.d.ts +61 -0
- package/dist/input.js +574 -0
- package/dist/input.js.map +1 -0
- package/dist/profiles/index.js +5 -0
- package/dist/profiles/index.js.map +1 -1
- package/dist/profiles/methodologies/build_domains.d.ts +6 -0
- package/dist/profiles/methodologies/build_domains.js +170 -0
- package/dist/profiles/methodologies/build_domains.js.map +1 -0
- package/dist/profiles/methodologies/operate_domains.d.ts +8 -0
- package/dist/profiles/methodologies/operate_domains.js +1239 -0
- package/dist/profiles/methodologies/operate_domains.js.map +1 -0
- package/dist/profiles/methodologies/regulated_domains.d.ts +6 -0
- package/dist/profiles/methodologies/regulated_domains.js +153 -0
- package/dist/profiles/methodologies/regulated_domains.js.map +1 -0
- package/dist/profiles/methodologies/research_domains.d.ts +8 -0
- package/dist/profiles/methodologies/research_domains.js +179 -0
- package/dist/profiles/methodologies/research_domains.js.map +1 -0
- package/dist/profiles/methodologies/strategy_domains.d.ts +15 -0
- package/dist/profiles/methodologies/strategy_domains.js +193 -0
- package/dist/profiles/methodologies/strategy_domains.js.map +1 -0
- package/dist/profiles/seeds.js +241 -95
- package/dist/profiles/seeds.js.map +1 -1
- package/dist/receipt.d.ts +17 -0
- package/dist/receipt.js +46 -0
- package/dist/receipt.js.map +1 -0
- package/dist/render.d.ts +4 -1
- package/dist/render.js +95 -28
- package/dist/render.js.map +1 -1
- package/dist/repl.d.ts +8 -1
- package/dist/repl.js +420 -62
- package/dist/repl.js.map +1 -1
- package/dist/sessions.d.ts +14 -0
- package/dist/sessions.js +100 -0
- package/dist/sessions.js.map +1 -1
- package/dist/ship.d.ts +2 -0
- package/dist/ship.js +62 -0
- package/dist/ship.js.map +1 -0
- package/dist/skills/catalog.d.ts +13 -0
- package/dist/skills/catalog.js +86 -0
- package/dist/skills/catalog.js.map +1 -0
- package/dist/tools/bash.js +81 -14
- package/dist/tools/bash.js.map +1 -1
- package/dist/tools/edit.js +21 -1
- package/dist/tools/edit.js.map +1 -1
- package/dist/tools/glob.js +4 -1
- package/dist/tools/glob.js.map +1 -1
- package/dist/tools/grep.d.ts +5 -0
- package/dist/tools/grep.js +101 -2
- package/dist/tools/grep.js.map +1 -1
- package/dist/tools/index.d.ts +22 -0
- package/dist/tools/index.js +177 -41
- package/dist/tools/index.js.map +1 -1
- package/dist/tools/ls.d.ts +3 -0
- package/dist/tools/ls.js +23 -12
- package/dist/tools/ls.js.map +1 -1
- package/dist/tools/multiedit.d.ts +12 -0
- package/dist/tools/multiedit.js +79 -0
- package/dist/tools/multiedit.js.map +1 -0
- package/dist/tools/stoa.d.ts +1 -1
- package/dist/tools/stoa.js +7 -3
- package/dist/tools/stoa.js.map +1 -1
- package/dist/tools/task.d.ts +9 -0
- package/dist/tools/task.js +166 -0
- package/dist/tools/task.js.map +1 -0
- package/dist/tools/todowrite.d.ts +12 -0
- package/dist/tools/todowrite.js +38 -0
- package/dist/tools/todowrite.js.map +1 -0
- package/dist/tools/webfetch.d.ts +6 -0
- package/dist/tools/webfetch.js +98 -0
- package/dist/tools/webfetch.js.map +1 -0
- package/dist/tools/websearch.d.ts +7 -0
- package/dist/tools/websearch.js +83 -0
- package/dist/tools/websearch.js.map +1 -0
- package/dist/tools/write.js +17 -1
- package/dist/tools/write.js.map +1 -1
- package/dist/verifiers/calc_gate.d.ts +2 -0
- package/dist/verifiers/calc_gate.js +112 -0
- package/dist/verifiers/calc_gate.js.map +1 -0
- package/dist/verifiers/citation_gate.d.ts +2 -0
- package/dist/verifiers/citation_gate.js +130 -0
- package/dist/verifiers/citation_gate.js.map +1 -0
- package/dist/verifiers/confidence_marked.d.ts +2 -0
- package/dist/verifiers/confidence_marked.js +49 -0
- package/dist/verifiers/confidence_marked.js.map +1 -0
- package/dist/verifiers/disclaimer_gate.d.ts +2 -0
- package/dist/verifiers/disclaimer_gate.js +57 -0
- package/dist/verifiers/disclaimer_gate.js.map +1 -0
- package/dist/verifiers/evidence_gate.d.ts +2 -0
- package/dist/verifiers/evidence_gate.js +108 -0
- package/dist/verifiers/evidence_gate.js.map +1 -0
- package/dist/verifiers/index.d.ts +5 -0
- package/dist/verifiers/index.js +28 -7
- package/dist/verifiers/index.js.map +1 -1
- package/dist/verifiers/lint.js +4 -3
- package/dist/verifiers/lint.js.map +1 -1
- package/dist/verifiers/promoted_kernels.d.ts +8 -0
- package/dist/verifiers/promoted_kernels.js +190 -0
- package/dist/verifiers/promoted_kernels.js.map +1 -0
- package/dist/verifiers/source_gate.d.ts +2 -0
- package/dist/verifiers/source_gate.js +125 -0
- package/dist/verifiers/source_gate.js.map +1 -0
- package/dist/verifiers/test_smoke.js +30 -0
- package/dist/verifiers/test_smoke.js.map +1 -1
- package/dist/verifiers/types.d.ts +3 -0
- package/package.json +4 -2
- package/skills/README.md +123 -0
- package/skills/ab-test.md +89 -0
- package/skills/api-design.md +175 -0
- package/skills/architecture-design.md +185 -0
- package/skills/business-case.md +77 -0
- package/skills/causal-inference.md +77 -0
- package/skills/clinical-guideline.md +98 -0
- package/skills/code-review.md +98 -0
- package/skills/cold-outreach.md +268 -0
- package/skills/competitive-teardown.md +223 -0
- package/skills/component-spec.md +121 -0
- package/skills/content-calendar.md +280 -0
- package/skills/contract-review.md +155 -0
- package/skills/data-analysis.md +187 -0
- package/skills/debug.md +91 -0
- package/skills/design-audit.md +121 -0
- package/skills/differential-diagnosis.md +79 -0
- package/skills/discovery-call.md +206 -0
- package/skills/edit-pass.md +80 -0
- package/skills/engineering-calc.md +101 -0
- package/skills/estimate.md +70 -0
- package/skills/experiment-design.md +105 -0
- package/skills/fact-check.md +82 -0
- package/skills/financial-model.md +104 -0
- package/skills/grant-proposal.md +93 -0
- package/skills/harmony-analysis.md +93 -0
- package/skills/hypothesis-generation.md +99 -0
- package/skills/incident-response.md +134 -0
- package/skills/interview-loop.md +62 -0
- package/skills/job-scorecard.md +92 -0
- package/skills/kb-article.md +174 -0
- package/skills/launch-plan.md +85 -0
- package/skills/lease-review.md +93 -0
- package/skills/lesson-plan.md +198 -0
- package/skills/literature-review.md +69 -0
- package/skills/market-entry.md +137 -0
- package/skills/market-sizing.md +159 -0
- package/skills/meta-analysis.md +140 -0
- package/skills/migrate.md +117 -0
- package/skills/optimize.md +88 -0
- package/skills/options-strategy.md +166 -0
- package/skills/peer-review.md +96 -0
- package/skills/pentest-plan.md +193 -0
- package/skills/pitch-review.md +132 -0
- package/skills/plan.md +88 -0
- package/skills/policy-brief.md +124 -0
- package/skills/positioning.md +192 -0
- package/skills/postmortem.md +168 -0
- package/skills/prd.md +105 -0
- package/skills/prioritize.md +162 -0
- package/skills/proof.md +91 -0
- package/skills/property-underwrite.md +159 -0
- package/skills/recipe-develop.md +109 -0
- package/skills/red-team.md +142 -0
- package/skills/refactor.md +58 -0
- package/skills/reflection-session.md +115 -0
- package/skills/regulatory-compliance.md +136 -0
- package/skills/reproduce.md +87 -0
- package/skills/runbook.md +344 -0
- package/skills/security-audit.md +154 -0
- package/skills/seo-brief.md +201 -0
- package/skills/sql-query.md +161 -0
- package/skills/story-craft.md +163 -0
- package/skills/tdd.md +59 -0
- package/skills/term-sheet.md +298 -0
- package/skills/theory-of-change.md +88 -0
- package/skills/threat-model.md +104 -0
- package/skills/ticket-triage.md +200 -0
- package/skills/tolerance-analysis.md +149 -0
- package/skills/training-program.md +151 -0
- package/skills/translate.md +64 -0
- package/skills/unit-economics.md +238 -0
- package/skills/valuation.md +112 -0
- package/skills/write-tests.md +77 -0
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: causal-inference
|
|
3
|
+
description: Identify causal effects, not correlations — define the estimand, draw the DAG, pick an identification strategy (RCT/IV/RDD/DiD/matching), and run falsification tests.
|
|
4
|
+
allowed-tools: Read, Bash, Write, Grep
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Phase 0 — State the Estimand Before Touching Data
|
|
8
|
+
|
|
9
|
+
1. Write the estimand in one sentence: target population (treated units / full population / subgroup), treatment contrast (T=1 vs T=0, or dose shift), outcome, and time horizon. ATE = E[Y(1)−Y(0)]; ATT = E[Y(1)−Y(0)|T=1]; CATE = E[Y(1)−Y(0)|X=x].
|
|
10
|
+
2. Declare the unit of analysis (individual, firm, market, day) and the causal horizon (short-run vs long-run effects may differ).
|
|
11
|
+
3. Write down what the ideal randomised experiment would look like. If you cannot describe it, you do not yet have a clear estimand.
|
|
12
|
+
|
|
13
|
+
## Phase 1 — Draw the DAG (Do This on Paper First)
|
|
14
|
+
|
|
15
|
+
4. List every variable: treatment T, outcome Y, pre-treatment covariates X, potential mediators M, potential colliders C, instruments Z.
|
|
16
|
+
5. Draw directed edges encoding your domain knowledge. An arrow A→B means "A directly causes B holding everything else fixed."
|
|
17
|
+
6. Mark confounders: common causes of T and Y that are NOT on the causal path T→...→Y.
|
|
18
|
+
7. Mark mediators: variables on the causal path T→M→Y. Conditioning on M blocks the path and biases total-effect estimates — do NOT adjust for mediators when estimating total effect.
|
|
19
|
+
8. Mark colliders: common effects of two causes (e.g. A→C←B). Conditioning on a collider OPENS a non-causal path — never condition on colliders.
|
|
20
|
+
9. Apply the backdoor criterion: a valid adjustment set blocks every backdoor path (paths into T) without opening any new paths via colliders.
|
|
21
|
+
10. Apply the frontdoor criterion only when all backdoor paths are unblockable but a measured mediator M captures the full T→Y path.
|
|
22
|
+
11. Run `dagitty` (R) or `pgmpy` (Python) to enumerate all d-separation statements and verify your proposed adjustment set is valid.
|
|
23
|
+
|
|
24
|
+
## Phase 2 — Choose the Identification Strategy
|
|
25
|
+
|
|
26
|
+
12. **RCT**: If treatment was randomised, check balance (t-tests / standardised mean differences < 0.1). Estimate via OLS with pre-specified covariates (Lin estimator) for efficiency. Watch for non-compliance → use ITT; if compliance < 100% use IV with assignment as instrument.
|
|
27
|
+
13. **Regression adjustment / backdoor**: Use when all confounders are measured. Run OLS or doubly-robust AIPW (combine propensity model + outcome model — consistent if either is correct). Use cross-fitting to avoid regularisation bias.
|
|
28
|
+
14. **Instrumental Variables**: Instrument Z must satisfy (a) relevance: Cov(Z,T)≠0 — report first-stage F > 10 (Stock-Yogo); (b) exclusion: Z affects Y only through T — this is untestable, argue it structurally; (c) independence: Z ⊥ unmeasured confounders. Use 2SLS. LATE = effect on compliers only; report that scope honestly.
|
|
29
|
+
15. **Regression Discontinuity**: Assignment determined by a running variable crossing a cutoff. Estimate local ATE at the cutoff only. Use local linear regression, not polynomials (overfitting). Optimal bandwidth via Imbens-Kalyanaraman or `rdrobust`. Check: (a) no manipulation of running variable (McCrary density test); (b) no jump in covariates at cutoff; (c) continuity of potential outcomes.
|
|
30
|
+
16. **Difference-in-Differences**: Requires (a) parallel trends in pre-period — test visually and with event-study regression; (b) no anticipation — treatment effect is zero before treatment; (c) stable unit treatment values (SUTVA). Use staggered DiD via Callaway-Sant'Anna or Sun-Abraham estimators (not TWFE, which is biased under heterogeneous effects). Report pre-trend coefficients explicitly.
|
|
31
|
+
17. **Matching / Propensity Score**: Estimate propensity score P(T=1|X) via logistic regression or gradient boosting. Match 1:1 nearest-neighbour with caliper 0.2×SD(logit(p)). Check post-match balance on ALL covariates. Prefer doubly-robust AIPW over PS-weighting alone (IPW is sensitive to extreme weights; trim weights at 99th percentile).
|
|
32
|
+
18. **Synthetic Control**: Use when N=1 treated unit and many pre-periods. Construct a convex combination of control units that matches pre-treatment outcomes (and covariates). Report the ratio of post-treatment MSPE to pre-treatment MSPE; run permutation tests across control units for inference.
|
|
33
|
+
|
|
34
|
+
## Phase 3 — Assumption Checks and Falsification Tests
|
|
35
|
+
|
|
36
|
+
19. **Balance check**: For every adjustment strategy, report standardised mean differences before and after adjustment for all covariates. SMD < 0.1 is the threshold.
|
|
37
|
+
20. **Pre-trend test (DiD/SC)**: Regress outcome on leads of treatment (event-study). Joint F-test of all pre-period coefficients = 0. Visualise the event study plot.
|
|
38
|
+
21. **Placebo outcome test**: Apply the same estimator to an outcome that the treatment logically cannot affect. A significant effect signals model misspecification or confounding.
|
|
39
|
+
22. **Negative control**: Find a negative control exposure (shares confounders with T but cannot cause Y) and a negative control outcome (shares confounders with Y but cannot be caused by T). Both should return null effects under a correct model.
|
|
40
|
+
23. **Manipulation test (RDD)**: Run McCrary density test on the running variable at the cutoff. A discontinuity in density signals sorting.
|
|
41
|
+
24. **Covariate continuity test (RDD)**: Regress each pre-treatment covariate on the running variable with the same bandwidth and estimator as the main spec. No covariate should jump at the cutoff.
|
|
42
|
+
25. **Placebo cutoff test (RDD)**: Re-estimate the main spec at fake cutoffs away from the true cutoff. Effects should be null.
|
|
43
|
+
26. **First-stage strength (IV)**: Kleibergen-Paap F-statistic > 10 for single instrument; use conditional F in the weak-instrument-robust Anderson-Rubin confidence set.
|
|
44
|
+
27. **Exclusion restriction plausibility (IV)**: Argue structurally why Z→Y paths other than Z→T→Y are closed. Cannot be tested directly; report honest threats.
|
|
45
|
+
28. **Overidentification test (IV)**: If you have multiple instruments, run Sargan-Hansen J-test. Rejection means at least one instrument violates exclusion.
|
|
46
|
+
|
|
47
|
+
## Phase 4 — Estimation and Inference
|
|
48
|
+
|
|
49
|
+
29. Use heteroskedasticity-robust (HC2/HC3) standard errors by default. Cluster at the level of treatment assignment. Never use non-clustered SEs when treatment is clustered.
|
|
50
|
+
30. For DiD, use cluster-robust SEs at the group level. With few clusters (< 30), use wild cluster bootstrap.
|
|
51
|
+
31. Report point estimate, 95% CI, and p-value. Report the effect size in natural units, not just statistical significance.
|
|
52
|
+
32. For CATE estimation, use causal forests (`grf` in R or `econml` in Python). Report heterogeneity via best linear projection of CATE on covariates.
|
|
53
|
+
33. Pre-register your specification. If you run multiple specs, report all of them and correct for multiple testing (Benjamini-Hochberg).
|
|
54
|
+
|
|
55
|
+
## Phase 5 — Sensitivity to Unmeasured Confounding
|
|
56
|
+
|
|
57
|
+
34. Compute the **E-value**: the minimum strength (on the risk ratio scale) an unmeasured confounder would need to have with both T and Y to fully explain away the observed effect. Report it alongside your main estimate. Formula: E = RR + sqrt(RR×(RR−1)) for RR > 1.
|
|
58
|
+
35. Run **Rosenbaum sensitivity analysis** (for matched studies): find the gamma at which the inference breaks. Report the range of gamma consistent with your conclusions.
|
|
59
|
+
36. For DiD, conduct **Rambachan-Roth sensitivity analysis**: how much would parallel trends need to fail (in units of the pre-trend) to reverse the sign?
|
|
60
|
+
37. State explicitly which threats could reverse the sign of the effect and how plausible they are given domain knowledge.
|
|
61
|
+
|
|
62
|
+
## Phase 6 — Reporting
|
|
63
|
+
|
|
64
|
+
38. Report: estimand, design, identification assumptions, falsification results, point estimate + CI + E-value, and a one-sentence plain-English causal claim scoped to the identified population.
|
|
65
|
+
39. Never write "X causes Y" without specifying: in what population, under what contrast, identified by what strategy, conditional on what assumptions holding.
|
|
66
|
+
40. Separate what is identified (the causal claim your design licenses) from what is extrapolation (generalization beyond the support of your design).
|
|
67
|
+
41. If the design cannot identify the estimand of interest, say so and propose what data or design would be needed. A credible null is more valuable than a spurious positive.
|
|
68
|
+
|
|
69
|
+
## Hard Rules (Never Violate)
|
|
70
|
+
|
|
71
|
+
- NEVER condition on a post-treatment variable when estimating total effect — this opens collider bias or blocks causal paths.
|
|
72
|
+
- NEVER interpret a regression coefficient as causal without stating the identification assumption that licenses it.
|
|
73
|
+
- NEVER report TWFE DiD with staggered treatment timing without heterogeneity-robust estimators.
|
|
74
|
+
- NEVER use propensity score weighting without trimming extreme weights and checking overlap.
|
|
75
|
+
- NEVER claim IV identifies ATE — it identifies LATE (the complier subpopulation) unless you have strong monotonicity + homogeneity arguments.
|
|
76
|
+
- NEVER skip the falsification phase — one failed placebo invalidates the design.
|
|
77
|
+
- ALWAYS report the E-value; a p < 0.05 with E-value of 1.2 is not credible evidence of causation.
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: clinical-guideline
|
|
3
|
+
description: Summarize clinical practice guidelines for a disease/condition/intervention — extracting evidence grades (GRADE/strength), target populations, contraindications, and cross-body conflicts — with primary source citations and a mandatory educational-use disclaimer; invoke when the user asks what guidelines say about a clinical topic, drug, screening threshold, treatment protocol, or diagnostic criterion.
|
|
4
|
+
allowed-tools: Read, WebSearch, WebFetch, Write
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
═══ HARD RULES ═══
|
|
8
|
+
|
|
9
|
+
1. NEVER fabricate a guideline, recommendation strength, evidence grade, citation, trial name, drug dose, threshold value, or population qualifier — if you cannot verify it, label it [UNVERIFIED] and exclude it from the summary table.
|
|
10
|
+
2. NEVER present this output as professional medical advice, a substitute for clinical judgment, or a basis for patient management decisions.
|
|
11
|
+
3. ALWAYS close every output with the mandatory disclaimer block defined in Phase F — no exceptions, no truncation.
|
|
12
|
+
4. Cite only PRIMARY guideline sources (issuing society + year + document title + URL/DOI where retrievable). Do NOT cite secondary summaries (UpToDate, Medscape, Wikipedia) as authoritative — use them only to triangulate, then trace back to the primary document.
|
|
13
|
+
5. When recommendations conflict across bodies (e.g., USPSTF vs. ACS vs. ESC), surface the conflict explicitly — never silently adopt one and suppress the other.
|
|
14
|
+
6. Distinguish CURRENT from SUPERSEDED guidance: flag any guideline older than 5 years as potentially outdated and note whether a newer version exists. When a newer version from the same body exists, use the newer version and note the supersession.
|
|
15
|
+
7. Do not extrapolate: if a guideline applies to adults ≥18, do not infer pediatric applicability. Population scope must be stated exactly as the issuing body states it.
|
|
16
|
+
8. Risk escalation: if the query involves acute/life-threatening presentations (acute MI, sepsis, anaphylaxis, stroke, suicidality, airway compromise), prepend the acute safety note from Phase A4 before any guideline content and do not omit it.
|
|
17
|
+
9. Never invent GRADE labels. Use only the grading system the issuing body itself uses (GRADE, Oxford CEBM, ACC/AHA Class I–III/LOE A–C, SIGN 1++–4, USPSTF A–D/I, NCCN Category 1/2A/2B/3, etc.) — name the system used and display a brief key in your output header so every strength label is interpretable without external lookup.
|
|
18
|
+
10. Living and interim guidelines (e.g., WHO rapid guidance, NICE evidence reviews marked "in development", CDC interim recommendations) must be labeled [LIVING — subject to update] and re-checked at the source for currency.
|
|
19
|
+
11. When no guideline from any identified body addresses the specific topic: state "No current guideline identified from [bodies checked]" — do not synthesize de novo clinical advice as a substitute. Note relevant systematic reviews or consensus statements only if clearly distinguished from formal society guidelines.
|
|
20
|
+
|
|
21
|
+
═══ PHASE A — SCOPE LOCK ═══
|
|
22
|
+
|
|
23
|
+
A1. Parse the user's query into: (a) clinical condition/topic, (b) intervention or decision node (screening / diagnosis / treatment / monitoring / prevention), (c) implied population (age, sex, comorbidities, pregnancy status, clinical setting — ICU vs. outpatient vs. community), (d) geographic/system scope (US, EU, UK, global, etc.). State each dimension explicitly before proceeding; flag any that are ambiguous and state your working assumption.
|
|
24
|
+
A2. Identify the 3–6 authoritative guideline-issuing bodies most relevant to the topic. Examples by domain:
|
|
25
|
+
- Cardiology: ACC/AHA, ESC, NICE, CCS, Heart Failure Society of America
|
|
26
|
+
- Oncology: NCCN, ESMO, ASCO, NICE (cancer pathways)
|
|
27
|
+
- Primary care/prevention: USPSTF, Canadian Task Force on Preventive Health Care, WHO
|
|
28
|
+
- Endocrinology: ADA, Endocrine Society, AACE
|
|
29
|
+
- Infectious disease: IDSA, ESCMID, CDC, WHO
|
|
30
|
+
- Psychiatry: APA, NICE, CANMAT, BAP
|
|
31
|
+
- Nephrology: KDIGO, ERA
|
|
32
|
+
List the bodies selected and the reason each is relevant to this query.
|
|
33
|
+
A3. For each body, state the grading taxonomy it uses and display a compact key in your output header:
|
|
34
|
+
- GRADE: Strong / Conditional (Weak), with evidence quality High/Moderate/Low/Very Low
|
|
35
|
+
- ACC/AHA: Class I (benefit>>risk) / IIa / IIb / III (no benefit or harm); LOE A (multiple RCTs/meta-analyses) / B-R / B-NR / C-LD / C-EO
|
|
36
|
+
- SIGN: 1++ (high-quality meta-analysis/SR/RCT with very low risk of bias) → 4 (expert opinion); A/B/C/D recommendation grades
|
|
37
|
+
- USPSTF: A (high certainty, substantial net benefit) / B / C / D (recommend against) / I (insufficient evidence)
|
|
38
|
+
- NCCN: Category 1 (high-level evidence, uniform consensus) / 2A / 2B / 3 (major disagreement)
|
|
39
|
+
- Other: name the system and its hierarchy
|
|
40
|
+
A4. Acute safety gate: if the topic involves a potential emergency (acute MI, sepsis, anaphylaxis, stroke, suicidality, airway compromise, DKA, massive hemorrhage), begin ALL output with:
|
|
41
|
+
"⚠ ACUTE SAFETY NOTE: The following is an educational summary only. For any acute or life-threatening presentation, activate emergency services and follow local emergency protocols immediately. No guideline summary substitutes for real-time clinical assessment."
|
|
42
|
+
|
|
43
|
+
═══ PHASE B — GUIDELINE RETRIEVAL ═══
|
|
44
|
+
|
|
45
|
+
B1. Run targeted WebSearch queries for each issuing body from A2, e.g.: "[Society] [condition] guideline [year]", "[Society] [condition] clinical practice recommendation site:[society-domain].org". Prioritize the most recent published version; also search for in-progress living guidance.
|
|
46
|
+
B2. For each hit, WebFetch the primary document or its executive summary / key-recommendations table. Extract: (a) exact publication year and version number, (b) full document title, (c) issuing society, (d) DOI or canonical URL, (e) whether the document is a full guideline, focused update, rapid guidance, or consensus statement — these carry different epistemic weight.
|
|
47
|
+
B3. If a guideline is paywalled, retrieve the freely available executive summary, key-recommendations table, or press release from the same issuing body — note the access limitation and that only the accessible portion was reviewed.
|
|
48
|
+
B4. Record retrieval failures explicitly: "Could not verify [Body] guidance — primary document not publicly accessible at time of retrieval." Never fill a gap with inference or secondary-source extrapolation.
|
|
49
|
+
B5. Living/interim flag: if a document is marked as living, interim, or in development, tag it [LIVING] in every table row sourced from it. Note the expected next review date if stated.
|
|
50
|
+
B6. Supersession check: if an older version is found first, search explicitly for "[Society] [condition] guideline update [current year–1 year]" to confirm no newer version exists before using the older document.
|
|
51
|
+
|
|
52
|
+
═══ PHASE C — RECOMMENDATION EXTRACTION ═══
|
|
53
|
+
|
|
54
|
+
C1. For each retrieved guideline, extract each discrete recommendation into a structured row:
|
|
55
|
+
| Recommendation | Grade/Class | LOE/Evidence Quality | Population Scope | Contraindications/Exceptions | Source (Society, Year) |
|
|
56
|
+
C2. Map grades faithfully to the issuing body's own system. Never convert between systems without explicit labeling (e.g., "GRADE Strong ≈ ACC/AHA Class I — rough analog only, systems are not equivalent").
|
|
57
|
+
C3. Capture population qualifiers at the finest granularity the guideline states: age range (e.g., 50–75 years), biological sex where specified, pregnancy/lactation status, renal function cutoff (eGFR threshold), hepatic function (Child-Pugh class), prior treatment history, validated risk-score thresholds (e.g., ASCVD 10-year risk ≥7.5%, CHA₂DS₂-VASc ≥2 in men/≥3 in women, FIB-4 ≥2.67).
|
|
58
|
+
C4. Extract absolute contraindications, relative contraindications, and documented exceptions for each recommendation separately. Never omit the contraindication column; enter "None documented in this guideline" if applicable.
|
|
59
|
+
C5. Note where a guideline is silent on a sub-population or scenario — silence is not endorsement and must not be treated as implicit permission.
|
|
60
|
+
C6. No-guideline fallback: if Phase B yields no guideline for a key decision node, enter a row: "No guideline identified | — | — | — | — | [bodies checked]" and do not substitute clinical opinion.
|
|
61
|
+
|
|
62
|
+
═══ PHASE D — CONFLICT ANALYSIS ═══
|
|
63
|
+
|
|
64
|
+
D1. Compare recommendations across bodies side by side. Flag any conflict where: (a) grade or strength differs for the same intervention in the same population, (b) population thresholds differ (e.g., mammography start age 40 per ACS vs. 50 per USPSTF 2016 vs. 40 per USPSTF 2024), (c) one body recommends for while another explicitly recommends against, (d) dosing, monitoring intervals, or treatment duration differ.
|
|
65
|
+
D2. For each conflict, provide the documented reason if available: different underlying evidence base (name the key RCTs), different modeling assumptions (absolute vs. relative risk, QALYs vs. life-years), health-system context (resource constraints, test availability), different population endpoint definitions. Do not editorialize about which body is correct.
|
|
66
|
+
D3. Rate conflict severity:
|
|
67
|
+
- MAJOR: opposite directions — one body issues a recommendation FOR, another recommends AGAINST the same intervention in the same population
|
|
68
|
+
- MODERATE: same direction, meaningfully different thresholds, risk-score cutoffs, or recommendation strength (e.g., one strong/Class I, another conditional/Class IIb)
|
|
69
|
+
- MINOR: wording nuance, monitoring interval, or administrative detail only
|
|
70
|
+
D4. Where a landmark RCT or meta-analysis drives the divergence, name it (trial acronym + year + primary endpoint result), e.g., "SPRINT (2015): SBP <120 mmHg reduced primary CV events 25% vs. <140 target; drove ACC/AHA 2017 threshold shift." Never invent trial names or results.
|
|
71
|
+
D5. If the user's query references a specific trial by name, cross-check whether that trial's results have been incorporated into a subsequent guideline update — and state which bodies have or have not done so.
|
|
72
|
+
|
|
73
|
+
═══ PHASE E — SYNTHESIS OUTPUT ═══
|
|
74
|
+
|
|
75
|
+
E1. Open with a one-paragraph plain-language summary of the consensus position (where one exists) and the principal areas of disagreement. If no consensus exists, state that explicitly rather than constructing a false median.
|
|
76
|
+
E2. Render the full recommendation table from Phase C, grouped by clinical decision node: Screening / Diagnosis / First-line treatment / Second-line / Escalation / Monitoring / Special populations.
|
|
77
|
+
E3. Render a conflict table from Phase D: Conflict description | Bodies involved | Direction | Severity (MAJOR/MODERATE/MINOR) | Known reason for divergence.
|
|
78
|
+
E4. Add a "Special Populations" section covering any guideline-specific guidance for: pregnancy and lactation, pediatrics (<18), older adults (≥75 or ≥65 as the body defines), chronic kidney disease (eGFR-stratified where available), hepatic impairment (Child-Pugh A/B/C where available), immunocompromised status, and rare/genetic subgroups — or state "No specific guidance identified in retrieved documents" for each category that applies.
|
|
79
|
+
E5. Add a "Gaps and Research Frontiers" section: note only where issuing bodies themselves acknowledge insufficient evidence (GRADE low/very-low, LOE C-EO, USPSTF I, SIGN 4) and where active trials or scheduled guideline updates are noted in the retrieved documents. Do not speculate beyond what the documents state.
|
|
80
|
+
E6. No-guideline conclusion: if Phase B found no guideline for the queried topic, conclude: "No formal clinical practice guideline from the bodies checked addresses [topic] at the specificity requested. Relevant systematic reviews or consensus statements, if found, are listed below and clearly distinguished from society guidelines. A clinician or specialist librarian should be consulted."
|
|
81
|
+
E7. List all primary sources in a numbered reference list: [N] Society. Title. Version/Year. URL/DOI. Guideline type (full / focused update / living / consensus statement). Retrieved: [date].
|
|
82
|
+
|
|
83
|
+
═══ PHASE F — MANDATORY DISCLAIMER ═══
|
|
84
|
+
|
|
85
|
+
F1. Close EVERY output — without exception, without truncation — with this exact block (fill in bracketed fields):
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
**EDUCATIONAL USE ONLY — NOT MEDICAL ADVICE**
|
|
89
|
+
|
|
90
|
+
This summary was generated by an AI agent for educational and informational purposes. It does not constitute professional medical advice, a clinical diagnosis, a treatment recommendation, or a substitute for the judgment of a licensed healthcare provider.
|
|
91
|
+
|
|
92
|
+
- Guidelines change. Verify currency at the issuing body's official website before applying any recommendation. Living guidelines may have been updated since this retrieval.
|
|
93
|
+
- Individual patient circumstances, comorbidities, contraindications, preferences, and local formulary/resource constraints must be evaluated by a qualified clinician.
|
|
94
|
+
- If you or someone you care for is experiencing a medical emergency, call emergency services (e.g., 911 in the US / 999 in the UK / 112 in the EU) immediately.
|
|
95
|
+
- Sources retrieved: [date of retrieval]. Issuing bodies checked: [list bodies from Phase A2].
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
KEY PRINCIPLE: Surface what the evidence actually says — grades, populations, conflicts, gaps, and the grading system used — exactly as the issuing bodies state it, never smoother or more certain than the underlying evidence warrants. When guidelines are absent, say so; when they conflict, show both sides.
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-review
|
|
3
|
+
description: Review a diff like a senior engineer — correctness first, then security, tests, API/compat, perf, readability; every comment file:line + concrete fix + severity tag.
|
|
4
|
+
allowed-tools: Read, Grep, Glob, Bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## PHASE 0 — ORIENT (do this before reading a single line of diff)
|
|
8
|
+
|
|
9
|
+
1. Run `git diff main...HEAD --stat` to see the full scope: files changed, insertions, deletions.
|
|
10
|
+
2. Run `git log main...HEAD --oneline` to read the commit message(s). Extract the stated intent.
|
|
11
|
+
3. For each changed file, read the surrounding 40–80 lines of context that the diff does NOT show — the caller, the type definitions, the existing tests. Use Read with offset/limit. Never judge a patch in isolation.
|
|
12
|
+
4. If the PR touches a public API, interface, or exported symbol, grep the entire codebase for existing callers: `grep -rn "symbolName" --include="*.ts" .` — understand the blast radius before forming any opinion.
|
|
13
|
+
5. Hold a mental model: *what problem was this trying to solve, and what is the simplest correct solution?* Everything you find is measured against that baseline.
|
|
14
|
+
|
|
15
|
+
## PHASE 1 — CORRECTNESS (blocking by default)
|
|
16
|
+
|
|
17
|
+
6. Trace the happy path end-to-end through the changed code. Write it out in one sentence. If you cannot, the code is not readable enough — flag it.
|
|
18
|
+
7. Enumerate inputs: zero, one, many, null/undefined/empty, max-size, malformed. For each, step through the logic manually. Does it handle them?
|
|
19
|
+
8. Check every error path. Are errors caught, logged, surfaced, or swallowed silently? Silent swallow = blocking bug.
|
|
20
|
+
9. Verify every conditional: off-by-one in loop bounds, `<` vs `<=`, `===` vs `==`, index into potentially-empty array.
|
|
21
|
+
10. Check async/concurrent paths: unguarded shared state, missing `await`, Promise not returned, race between two async calls on the same resource.
|
|
22
|
+
11. Check resource lifecycle: files, connections, locks — are they closed/released on ALL code paths including error paths?
|
|
23
|
+
12. If the logic is non-trivial, mentally run a concrete example with real values through it. Does the output match the stated intent?
|
|
24
|
+
13. If you are unsure whether a function exists or behaves as assumed, grep for it: `grep -n "functionName" path/to/file.ts`. Never trust the description.
|
|
25
|
+
|
|
26
|
+
## PHASE 2 — SECURITY (blocking if relevant)
|
|
27
|
+
|
|
28
|
+
14. Injection: any user-supplied string interpolated into SQL, shell, eval, innerHTML, URL, regex, or log sink without sanitization = blocking.
|
|
29
|
+
15. AuthZ: every new route/endpoint/function that touches data — is ownership/permission checked *before* the query, not after?
|
|
30
|
+
16. Secrets: `grep -rn "sk-\|api_key\|password\|secret\|token" --include="*.ts" .` on changed files. Hardcoded credential = blocking.
|
|
31
|
+
17. Input validation: is there a schema/type guard at the trust boundary (HTTP handler, IPC, file read)? Trusting internal callers is fine; trusting network/user input is not.
|
|
32
|
+
18. Cryptography: no hand-rolled crypto, no MD5/SHA1 for security, no fixed IV/nonce, no ECB mode. If crypto is touched, flag for specialist review.
|
|
33
|
+
19. Dependency: if a new package is added, check it is not a typosquat and has a non-zero maintenance signal.
|
|
34
|
+
|
|
35
|
+
## PHASE 3 — TESTS
|
|
36
|
+
|
|
37
|
+
20. Run `grep -rn "describe\|it(\|test(" --include="*.test.*" .` scoped to the changed feature. Do tests exist?
|
|
38
|
+
21. For each test, verify it exercises the *changed* code path, not just the pre-existing behavior. A test that passes before the PR is not a test for this PR.
|
|
39
|
+
22. Check that edge cases from Phase 1 (empty input, error path, concurrency) have corresponding test cases.
|
|
40
|
+
23. Check test assertions are meaningful: `expect(result).toBeDefined()` is not a meaningful assertion. `expect(result.id).toBe(42)` is.
|
|
41
|
+
24. If tests are missing for a correctness-critical path, tag it **blocking**. If they exist but are thin, tag **should-fix**.
|
|
42
|
+
|
|
43
|
+
## PHASE 4 — API / CONTRACT / BACKWARD COMPAT
|
|
44
|
+
|
|
45
|
+
25. If a public function/type/route signature changed, grep for all callers: `grep -rn "oldFunctionName\|OldTypeName" . --include="*.ts"`. Every caller must be updated or the old signature kept as a shim.
|
|
46
|
+
26. Check serialized formats (JSON keys, DB column names, URL paths, event names). Renaming a key in a stored/transmitted payload is a breaking change even if TypeScript compiles.
|
|
47
|
+
27. Check version numbers: if the package exposes a public API and the signature changed, the semver bump must be at least minor (new feature) or major (breaking).
|
|
48
|
+
28. Check migration files if DB schema changed. Is the migration reversible? Does it lock tables on large datasets?
|
|
49
|
+
|
|
50
|
+
## PHASE 5 — PERFORMANCE (only real hot paths)
|
|
51
|
+
|
|
52
|
+
29. Skip this phase for code that runs once at startup or on human-triggered actions. Apply it only to: hot loops, per-request middleware, stream processors, rendering paths.
|
|
53
|
+
30. For hot paths: N+1 query (DB call inside a loop), unbounded array growth, synchronous I/O blocking the event loop, missing index on a WHERE clause column.
|
|
54
|
+
31. Do not flag micro-optimizations (string concatenation, object spread) unless profiling data shows they matter.
|
|
55
|
+
|
|
56
|
+
## PHASE 6 — READABILITY / MAINTAINABILITY / NAMING
|
|
57
|
+
|
|
58
|
+
32. Names: does the name describe *what it is* (noun for data, verb for action)? Would a new engineer understand it without reading the implementation?
|
|
59
|
+
33. Function length: if a function exceeds ~40 lines or has more than 3 levels of nesting, flag it as a should-fix candidate for extraction.
|
|
60
|
+
34. Comments: comments should explain *why*, not *what*. A comment that restates the code is noise. A comment explaining a non-obvious invariant or workaround is gold.
|
|
61
|
+
35. Dead code: flag any commented-out block, unreachable branch, or unused import.
|
|
62
|
+
36. Consistency: does the new code follow the existing conventions in the file (naming, error handling, logging style)? Tag inconsistencies as nit.
|
|
63
|
+
|
|
64
|
+
## PHASE 7 — REUSE
|
|
65
|
+
|
|
66
|
+
37. Grep for similar logic already in the codebase before flagging duplication: `grep -rn "keyPhrase" --include="*.ts" .`. If a utility already exists, the new code should use it.
|
|
67
|
+
38. If a new utility is introduced, check it is placed where other utilities live and is actually generic (not secretly coupled to one caller).
|
|
68
|
+
|
|
69
|
+
## COMMENT FORMAT (every finding, no exceptions)
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
**[SEVERITY] file/path.ts:LINE**
|
|
73
|
+
_Issue:_ One sentence describing the problem concretely.
|
|
74
|
+
_Why it matters:_ One sentence on the consequence if unfixed.
|
|
75
|
+
_Suggested fix:_
|
|
76
|
+
```suggestion
|
|
77
|
+
// concrete code replacement here
|
|
78
|
+
```
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Severity levels:
|
|
82
|
+
- **blocking** — must be resolved before merge; correctness, security, or data-loss risk.
|
|
83
|
+
- **should-fix** — strong recommendation; likely to cause a bug or maintenance burden; merge only with explicit acknowledgment.
|
|
84
|
+
- **nit** — style, naming, minor cleanup; author may take or leave; do not block merge on nits alone.
|
|
85
|
+
|
|
86
|
+
## HARD RULES
|
|
87
|
+
|
|
88
|
+
39. Never approve if any **blocking** finding is open.
|
|
89
|
+
40. Never approve if correctness-critical paths have zero test coverage.
|
|
90
|
+
41. Separate objective correctness from personal style preference. "I would have written it differently" is not a blocking comment.
|
|
91
|
+
42. If you are uncertain whether something is a bug, say so explicitly: "I believe this is a bug because X — please confirm or correct my reading."
|
|
92
|
+
43. Do not trust the PR description. Verify behavior by reading the code. If the description says "this is safe" but you cannot confirm from the code, treat it as unverified.
|
|
93
|
+
44. Be direct. "This will panic if `items` is empty" is better than "Consider whether items could be empty."
|
|
94
|
+
45. Be kind. Assume competence and good intent. The goal is a better codebase, not a score.
|
|
95
|
+
46. End the review with one of three verdicts:
|
|
96
|
+
- **APPROVE** — correctness and tests satisfied, no blocking findings.
|
|
97
|
+
- **REQUEST CHANGES** — one or more blocking or should-fix findings open.
|
|
98
|
+
- **COMMENT** — observations only, no change required; used for informational notes on approved PRs.
|
|
@@ -0,0 +1,268 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: cold-outreach
|
|
3
|
+
description: Execute multi-touch, research-driven cold outreach sequences for AI infrastructure / agentic platforms; one clear CTA per touch, no spray-and-pray.
|
|
4
|
+
allowed-tools: Read, Write
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## COLD OUTREACH PLAYBOOK — Theron / AI Agent Infrastructure
|
|
8
|
+
|
|
9
|
+
═══ ICP DEFINITION (who to target before wasting a touch) ═══
|
|
10
|
+
|
|
11
|
+
**IN:** Companies that are BUILDING on top of AI, not consuming it passively.
|
|
12
|
+
- Agent platform teams (coding agents, security agents, research agents, workflow orchestrators).
|
|
13
|
+
- AI-native SaaS startups with 5–50 eng headcount scaling inference.
|
|
14
|
+
- ML infrastructure teams inside enterprises running >100 concurrent agent calls/day.
|
|
15
|
+
- Founders or VPs publicly posting about: model cost, agent latency, multi-model orchestration, agent governance, or audit trails.
|
|
16
|
+
|
|
17
|
+
**OUT (skip immediately, do not pitch):**
|
|
18
|
+
- AI model vendors (Anthropic, OpenAI, Mistral, Cohere) — they are competitive substrates, not buyers.
|
|
19
|
+
- Consumer apps with no agentic layer.
|
|
20
|
+
- Anyone whose last 10 posts are recruiting-only (not building in public).
|
|
21
|
+
- Anyone who has already solved routing with a published, benchmarked approach.
|
|
22
|
+
|
|
23
|
+
**SIGNAL SCORING — prioritize by total score; don't touch below 3:**
|
|
24
|
+
- +2: Posted about agent routing, model cost, or multi-specialist inference in last 30 days.
|
|
25
|
+
- +2: Hired for "agent infrastructure," "LLM serving," or "ML platform" in last 60 days.
|
|
26
|
+
- +2: Recent funding ($2M–$50M range — scaling, but not yet locked into an enterprise vendor).
|
|
27
|
+
- +1: GitHub repo shows multi-model orchestration, custom inference harness, or tool-routing layer.
|
|
28
|
+
- +1: Published engineering blog on inference cost, latency, or agent framework tradeoffs.
|
|
29
|
+
- +1: Uses RunPod, Modal, or vLLM (indicating self-serve inference — not locked into a vendor).
|
|
30
|
+
- −2: Uses a fully managed AI API (OpenAI/Azure/Bedrock only) with no evidence of custom serving.
|
|
31
|
+
- −2: B2B SaaS with AI as a feature, not the core product.
|
|
32
|
+
|
|
33
|
+
═══ HARD RULES ═══
|
|
34
|
+
|
|
35
|
+
1. **Research BEFORE first touch.** Trigger = specific technical debt, architectural gap, or business outcome the prospect needs to solve. Never guess.
|
|
36
|
+
2. **One CTA per email.** Every message asks for ONE of: 15-min call, specific resource review, attended demo, pilot scope.
|
|
37
|
+
3. **No "just checking in" or "circling back."** Every message advances a thesis about their need or delivers a micro-insight they can act on without replying.
|
|
38
|
+
4. **Verify decision maker role.** VP Eng ≠ VP Product ≠ CRO. Wrong buyer = dead thread. Cross-check LinkedIn + org structure before sending.
|
|
39
|
+
5. **Cadence = 3–5 touches over 3 weeks.** Minimum 48–72 hours between touches. Touch 5 must have an explicit exit — no open-ended ghosting.
|
|
40
|
+
6. **Personalization is mandatory, not optional.** Reference a SPECIFIC technical choice, hire, blog post, or GitHub commit. Show you read their docs or code.
|
|
41
|
+
7. **No fabricated ROI or case studies.** Claim only what lives in `claims.yaml` (Theron-backed claims only). "[measurement]" is not a valid claim — omit it or replace it with a real number.
|
|
42
|
+
8. **Thread into their existing tooling.** Know their current stack (RunPod, Modal, AWS, vLLM, Kubernetes, OpenRouter, etc.) before pitching Theron's LoRA routing layer on top.
|
|
43
|
+
|
|
44
|
+
═══ PHASE A: RESEARCH & HYPOTHESIS ═══
|
|
45
|
+
|
|
46
|
+
**A0. Intake (5 min)**
|
|
47
|
+
- What is their product? (SaaS, infrastructure, agent platform, enterprise tool)
|
|
48
|
+
- What is their public tech narrative? (blog, GH repos, job postings, HN comments by founders)
|
|
49
|
+
- What decision or hire signals a gap they have not yet filled?
|
|
50
|
+
- Signal score (use ICP scoring above) — do not proceed below 3.
|
|
51
|
+
|
|
52
|
+
**A1. Dig — Technical Debt Angle (10 min)**
|
|
53
|
+
- Search company blog, GH repos, engineering posts, Hacker News.
|
|
54
|
+
- Look for: model serving cost spikes, latency SLA complaints (sub-50ms), multi-specialist routing needs, governance or audit requirements, evidence of model merging or multiple model deployments.
|
|
55
|
+
- Cross-check: recent funding (signals scaling) or specific hires (new infra/agent hire = active build).
|
|
56
|
+
- One primary trigger per outreach — do not pile three triggers in one email.
|
|
57
|
+
|
|
58
|
+
**A2. Craft Hypothesis (3 min)**
|
|
59
|
+
- Frame: "You're solving X with [current approach], but X has a ceiling at [metric/scale] — Theron's [specific lever] removes that ceiling."
|
|
60
|
+
- Do NOT claim "we're cheaper" — claim "we're cheaper per QUALITY TIER" or "you can drop compute via LoRA hot-swap without retraining model weights."
|
|
61
|
+
- Reject generic angles: "better agentic OS" alone is not a trigger. Trigger = "your agents call 40+ tools; you need 30% latency cut + multi-specialist routing without merging models."
|
|
62
|
+
- Write the hypothesis as one sentence. If you cannot write it in one sentence, you have not found the trigger yet.
|
|
63
|
+
|
|
64
|
+
═══ PHASE B: FIRST TOUCH (Email or LinkedIn DM) ═══
|
|
65
|
+
|
|
66
|
+
**B1. Subject Line / Hook**
|
|
67
|
+
- Do NOT use: "Accelerate your AI," "Great opportunities," "Quick question," "Following up on my last email."
|
|
68
|
+
- DO use: Reference a specific piece of their work. Example: "Re: Your Qwen3 routing layer — LoRA-based composition might halve your cold-start."
|
|
69
|
+
- Format: Max 50 chars. Question beats statement. Never start with your company name.
|
|
70
|
+
|
|
71
|
+
**B2. Opening Paragraph (2–3 lines)**
|
|
72
|
+
- Trigger + credibility in ONE sentence: "I build domain-specialist LoRA routing at Theron; your Nov hiring for 'agent platform infra' suggests you're scaling multi-specialist agents."
|
|
73
|
+
- Why YOU not someone else: "We've measured 15 LoRAs (Cyber, Coder, etc.) on one 72B base — removes the merge bottleneck that hits teams at 5+ concurrent specialists."
|
|
74
|
+
- Name check: verify you got it right before sending. Wrong name kills the thread.
|
|
75
|
+
|
|
76
|
+
**B3. The Thesis (1 paragraph, max 5 lines)**
|
|
77
|
+
- Their situation: "Your agents likely need code review, security, reasoning, and math in one run. Today that's either 4× model calls or merged weights with a latency tax."
|
|
78
|
+
- Your observation: "Theron's approach: frozen base + per-call LoRA hot-swap + oracle-acceptance speculative decode."
|
|
79
|
+
- The insight: "Means you ship multi-specialist routing without re-tuning model weights — and with cryptographic receipts on every agent action."
|
|
80
|
+
- Do NOT oversell. State only what you know they are likely solving.
|
|
81
|
+
|
|
82
|
+
**B4. Micro-Offer (1 line)**
|
|
83
|
+
- Offer value THEY extract without needing to reply. Options:
|
|
84
|
+
- "We open-sourced the LoRA composition contract in theron-agent-sdk; might short-circuit your tool-routing layer." (link: itstheron.com)
|
|
85
|
+
- "I can share our cold-start benchmarks for LoRA swap vs. full-model load — might help you scope your infra tradeoffs."
|
|
86
|
+
- "Wrote a three-question checklist for evaluating multi-specialist routing; let me know if you want it."
|
|
87
|
+
- Concrete > vague. Never: "I'd love to learn more about your business."
|
|
88
|
+
|
|
89
|
+
**B5. Call-to-Action (1 line)**
|
|
90
|
+
- ONE action only: "Does a 15-min call to map your routing layer fit your calendar this Thursday or Friday?"
|
|
91
|
+
- Give 2 specific dates/times, not "let me know your availability" — friction kills conversion.
|
|
92
|
+
- If cold DM: "I'll send a calendar link if Tuesday 2–3pm PT works."
|
|
93
|
+
|
|
94
|
+
**B6. Signature**
|
|
95
|
+
- Name, title, Vext Labs.
|
|
96
|
+
- Link to ONE artifact: itstheron.com or a specific SDK/benchmark. Not a spray of links.
|
|
97
|
+
|
|
98
|
+
═══ PHASE C: CADENCE & SUBSEQUENT TOUCHES ═══
|
|
99
|
+
|
|
100
|
+
**C1. Touch 2 (48–72 hours, if no reply)**
|
|
101
|
+
- **Channel shift** (email → LinkedIn, or reverse).
|
|
102
|
+
- **Angle shift** (technical → architectural tradeoff). Example: "Most teams building multi-agent pipelines hit the same fork: separate model deployments vs. merged weights. Both have failure modes. Happy to map how we approached it at Theron — no commitment."
|
|
103
|
+
- **Micro-content** (not 50-line email). Attach ONE specific thing: an architecture diagram, a benchmark figure from claims.yaml, or a link to a relevant theron-agent-sdk pattern.
|
|
104
|
+
- **CTA:** "Happy to hop on a brief call only if this resonates — does it?" (lower friction; removes scheduling obligation).
|
|
105
|
+
|
|
106
|
+
**C2. Touch 3 (5–7 days after Touch 1)**
|
|
107
|
+
- **Angle:** Problem reversal. Directly ask what their actual blocker is: "Maybe multi-specialist routing isn't the priority right now — is the bigger pain cost-per-inference, latency under load, auditability of agent decisions, or ownership of model weights?"
|
|
108
|
+
- **Offer:** Genuinely useful resource with a direct link: the theron-agent-sdk open-source loop layer, a published benchmark, or a two-question architectural checklist.
|
|
109
|
+
- **CTA:** "No pressure if it's not a fit — but curious if any of this maps to your roadmap this half."
|
|
110
|
+
|
|
111
|
+
**C3. Touch 4 (10–12 days, if still cold)**
|
|
112
|
+
- **Angle:** Specific event-triggered urgency. "Just saw you announced [hire/integration/launch]. That typically signals the multi-model routing problem is becoming expensive — happy to compare notes."
|
|
113
|
+
- **Social proof (claims only):** Only cite something in claims.yaml. Do not invent case studies. If nothing qualifies: skip this line entirely.
|
|
114
|
+
- **CTA:** "If agent governance or routing cost is on your roadmap this year, I'll send over a one-pager. Otherwise, no worries."
|
|
115
|
+
|
|
116
|
+
**C4. Touch 5 (14–16 days, final touch)**
|
|
117
|
+
- **Angle:** Explicit, credible exit. "Sounds like now's not the right time. If multi-specialist routing, model-cost, or agent audit trails become priorities, you know where to find me."
|
|
118
|
+
- **Leave the door open — permanently.** "Following you; when the problem shows up, reach out directly."
|
|
119
|
+
- **CTA:** NONE. This is goodbye-with-credibility. Any CTA here damages it.
|
|
120
|
+
|
|
121
|
+
═══ PHASE D: OBJECTION HANDLING ═══
|
|
122
|
+
|
|
123
|
+
**D1. "We use Modal/OpenRouter/AWS — not a fit."**
|
|
124
|
+
- Reframe, do not pitch harder. "Totally — Modal handles the infrastructure layer. We're the routing logic ON TOP of Modal (or at the agent level). Most teams run both."
|
|
125
|
+
- Redirect to the technical buyer (ML eng or agent platform lead, not DevOps).
|
|
126
|
+
|
|
127
|
+
**D2. "Show me ROI / a case study."**
|
|
128
|
+
- Never fabricate. "I can't cite a customer story right now, but I can walk you through the math for your specific setup: your [stated metric] today and how our pattern changes it. Worth 20 mins?"
|
|
129
|
+
- OR: "Our public claims are locked to what we've measured; everything else is on the roadmap. Rather than oversell, why don't you tell me your constraint and I'll tell you honestly if we can help?"
|
|
130
|
+
|
|
131
|
+
**D3. "How much?"**
|
|
132
|
+
- Close competitor (they are evaluating, not ready to buy). "Theron Pro is $20/mo for individual use; enterprise and SDK licensing depends on your deployment. If you're evaluating frameworks or building in-house, the better move is 15 mins where I ask about your constraint first — then pricing is obvious."
|
|
133
|
+
- Do NOT lead with pricing; they're evaluating fit, not cost.
|
|
134
|
+
|
|
135
|
+
**D4. "Send me more info."**
|
|
136
|
+
- Red flag — delay posture, not genuine interest. "I can, but honestly, the best use of both our time is a 15-min call where I ask questions first and we figure out if it's even relevant. Sound fair?"
|
|
137
|
+
- If they persist: "Here's a two-pager [attach specific artifact]. If it clicks, let's talk — no follow-up pressure otherwise."
|
|
138
|
+
|
|
139
|
+
**D5. "We're building this in-house."**
|
|
140
|
+
- Do not fight it. "Makes sense — what's the hardest part of building it yourself? LoRA serving, the routing logic, the audit trail, or the eval harness?"
|
|
141
|
+
- If they answer: that answer IS your hook for Touch 2 or a pivot in the same conversation.
|
|
142
|
+
|
|
143
|
+
═══ PHASE E: CONVERSION & NURTURE ═══
|
|
144
|
+
|
|
145
|
+
**E1. Call Prep (if they accept)**
|
|
146
|
+
- Goal: Diagnose their real constraint — latency, cost, safety/auditability, ownership of weights, team bandwidth.
|
|
147
|
+
- Stack map: Ask their current setup (agents → inference → storage → verification layer).
|
|
148
|
+
- Do NOT pitch for 15 minutes. Ask for 9; listen for 6. The ratio that loses deals is >50% talking.
|
|
149
|
+
|
|
150
|
+
**E2. Post-Call (within 2 hours)**
|
|
151
|
+
- Send ONLY what you committed to: specific architecture diagram, benchmark, SDK link, demo URL. Zero fluff, zero re-pitch.
|
|
152
|
+
- One-line recap: "Sounds like [their constraint] is the lever. If you want to explore it, next step is [specific action — pilot scope, architecture review, async eval]."
|
|
153
|
+
- One next step only. Multiple CTAs in the post-call email are the #1 conversion killer.
|
|
154
|
+
|
|
155
|
+
**E3. Nurture (warm but not ready to move now)**
|
|
156
|
+
- Monthly, send ONE piece tied to something they mentioned or a public event relevant to them.
|
|
157
|
+
- Qualifying triggers to move a nurtured prospect back to active: new agent-infra hire, funding announcement, public post about routing cost or latency, launch of a multi-model feature.
|
|
158
|
+
- Do NOT email more than monthly in nurture. Frequency is inversely correlated with reply rate past the first touch.
|
|
159
|
+
- Track the qualifying trigger you are waiting for — if you can't name it, you don't have a nurture lead, you have a dead thread.
|
|
160
|
+
|
|
161
|
+
═══ TECHNICAL ANGLE LIBRARY (Domain-Specific Triggers) ═══
|
|
162
|
+
|
|
163
|
+
**For multi-specialist agent builders:**
|
|
164
|
+
- "I saw you hired for agent-routing; we route LoRAs not models — 15x cheaper than serving merged copies."
|
|
165
|
+
- "Your agents call 30+ tools; LoRA composition means you inject domain knowledge per call without retraining."
|
|
166
|
+
|
|
167
|
+
**For security/compliance teams:**
|
|
168
|
+
- "Agent audit trails are a liability when agents act autonomously; Theron agents are cryptographically receipted per action."
|
|
169
|
+
- "Your agent can't attest what it did or why — we ship the proof layer as a verifiable receipt."
|
|
170
|
+
|
|
171
|
+
**For cost-sensitive inference teams:**
|
|
172
|
+
- "Running 15 model copies for 15 domain specialists is margin death; LoRA adapters on one base solve it."
|
|
173
|
+
- "Speculative decode with a sound oracle halves inference cost without latency regression."
|
|
174
|
+
|
|
175
|
+
**For agentic OS / framework builders:**
|
|
176
|
+
- "Your agents need persistence + multi-step reasoning + governance — theron-agent-sdk is open-source; bring your base model, we do the routing and receipt layer."
|
|
177
|
+
- "We've open-sourced the loop layer (theron-agent-sdk); the question is whether our LoRA routing fits on top of your framework."
|
|
178
|
+
|
|
179
|
+
**For teams currently using OpenRouter as inference:**
|
|
180
|
+
- "OpenRouter is the right move to bootstrap — but when you need specialist routing (code agent vs. security agent vs. reasoning), you'll want per-domain adapters, not per-model costs."
|
|
181
|
+
|
|
182
|
+
═══ RED FLAGS (Disqualify Immediately) ═══
|
|
183
|
+
|
|
184
|
+
1. **Prospect is an AI model vendor.** They will not adopt external governance or LoRA routing — it threatens their model lock-in.
|
|
185
|
+
2. **They claim to have solved agent routing without published measurement.** Low signal; likely premature optimization. Do not pitch into vaporware.
|
|
186
|
+
3. **They want Theron to be a Claude wrapper.** Redirect to itstheron.com for consumer use. We are a research lab and framework, not a hosted chat API.
|
|
187
|
+
4. **They ask "are you hiring?" in the first reply.** They are not evaluating; they are recruiting. Save for later if at all.
|
|
188
|
+
5. **They have not run a single agent in production.** Pre-product teams do not have the routing problem yet. Nurture only; do not burn a live sequence on them.
|
|
189
|
+
|
|
190
|
+
═══ TEMPLATES (Copy + Parameterize) ═══
|
|
191
|
+
|
|
192
|
+
**Touch 1 — Technical Trigger:**
|
|
193
|
+
```
|
|
194
|
+
Subject: Re: [their specific thing] — LoRA routing might halve your cold-start
|
|
195
|
+
|
|
196
|
+
Hi [name],
|
|
197
|
+
|
|
198
|
+
Saw your [blog post / hiring announcement / GitHub repo] on agent routing. We're solving the same problem at Theron — multi-specialist agents without the merge cost.
|
|
199
|
+
|
|
200
|
+
Your team likely runs 4–8 domain agents (code review, security, reasoning, etc.). Today that's either separate model calls or merged weights. We route LoRAs on a frozen 72B base; each agent call swaps the right domain adapter in ~5s cold start, ~50ms hot.
|
|
201
|
+
|
|
202
|
+
I can show you the pattern if it maps to your roadmap. Does a 15-min call Thursday or Friday work?
|
|
203
|
+
|
|
204
|
+
— [name], Vext Labs
|
|
205
|
+
itstheron.com
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
**Touch 3 — Angle Reversal:**
|
|
209
|
+
```
|
|
210
|
+
Subject: Direct question about your agent stack
|
|
211
|
+
|
|
212
|
+
[name],
|
|
213
|
+
|
|
214
|
+
Wanted to ask directly: when you think about your agent framework, what's the biggest lever right now — cost per inference, latency under load, auditability of agent decisions, or ownership of the model weights?
|
|
215
|
+
|
|
216
|
+
Asking because Theron hits each of these differently. Might save time if I can map to your specific bottleneck rather than pitch broadly.
|
|
217
|
+
|
|
218
|
+
No pressure — just curious if any of this is on your H2 roadmap.
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Touch 5 — Exit, No CTA:**
|
|
222
|
+
```
|
|
223
|
+
Subject: Re: [original subject]
|
|
224
|
+
|
|
225
|
+
[name],
|
|
226
|
+
|
|
227
|
+
Haven't heard back — sounds like now's not the right time. Following you; when multi-specialist routing, inference cost, or agent audit trails become a priority, you know where to find me.
|
|
228
|
+
|
|
229
|
+
Good luck on the build.
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
═══ METRICS TO TRACK ═══
|
|
233
|
+
|
|
234
|
+
- **Open rate** (email): >25% = subject line worked; <15% = hypothesis not compelling or targeting wrong person.
|
|
235
|
+
- **Reply rate** (full 5-touch sequence): >5% = trigger and angle credible; <2% = wrong ICP or generic messaging.
|
|
236
|
+
- **Call booked from Touch 1**: 2–4% is realistic cold; >5% = subject and trigger are very tight.
|
|
237
|
+
- **Call-to-pilot conversion**: >20% if they get a real demo means urgency is genuine.
|
|
238
|
+
- **Disqualification rate**: If >60% of your prospects are failing the ICP score, you are targeting wrong accounts — fix the account list, not the email copy.
|
|
239
|
+
|
|
240
|
+
═══ ANTI-PATTERNS ═══
|
|
241
|
+
|
|
242
|
+
- **Multi-paragraph opener.** Subject line + first 2 lines are the entire message for skim purposes. Everything else is insurance.
|
|
243
|
+
- **Generic social proof.** ("We work with leading AI companies…") Mention NOTHING unless it is in claims.yaml.
|
|
244
|
+
- **Assumed urgency.** ("ASAP," "quick question," "don't want to miss out.") Kills credibility permanently.
|
|
245
|
+
- **Multiple CTAs in one touch.** ("Call me, email me, check out our site, download our whitepaper.") Paralyzes the buyer.
|
|
246
|
+
- **Responding to every objection with more pitch.** Counter objections with a question: "How are you thinking about [the gap] today?"
|
|
247
|
+
- **Touching before ICP scoring.** One cold email to the wrong company is one wasted signal. Five touches to the wrong company is five destroyed.
|
|
248
|
+
- **Placeholder benchmarks.** "[measurement]" is not a claim — either cite a real benchmark from claims.yaml or omit it. Prospects notice.
|
|
249
|
+
|
|
250
|
+
═══ SUCCESS CASE ═══
|
|
251
|
+
|
|
252
|
+
**Your hypothesis:** They run N domain agents, each needing different specialist knowledge, and they are paying for either N full-model deployments or merged weights (latency + retraining cost).
|
|
253
|
+
|
|
254
|
+
**Your trigger:** Recent hire titled "agent platform engineer" or an engineering post complaining about model serving costs.
|
|
255
|
+
|
|
256
|
+
**Your Touch 1 first line:** "I saw you hired for agent routing — we route LoRAs, not models. Want to see how?"
|
|
257
|
+
|
|
258
|
+
**Their reply (if hot):** "Yeah, we're exploring that. What's your take?"
|
|
259
|
+
|
|
260
|
+
**Your call:** 9 minutes: ask their constraint, listen, map to the specific LoRA-routing or receipt layer that fits. Show one benchmark if it exists in claims.yaml. Offer: "Pilot scope is [specific thing — architecture review, SDK integration, async eval]. Worth trying?"
|
|
261
|
+
|
|
262
|
+
**Their reply (if they bite):** "Let's try it."
|
|
263
|
+
|
|
264
|
+
**Your next step:** Onboarding runbook, not a sales deck. They ARE your customer now.
|
|
265
|
+
|
|
266
|
+
═══ KEY PRINCIPLE ═══
|
|
267
|
+
|
|
268
|
+
**Research + specificity + one micro-value per touch = 5x reply rate of spray-and-pray. The trigger is the email; the email is just the envelope.**
|