npm - @vextlabs/theron-cli - Versions diffs - 0.3.0 → 0.4.0 - Mend

@vextlabs/theron-cli 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (169) hide show

package/dist/api.d.ts +8 -0
package/dist/api.js +3 -0
package/dist/api.js.map +1 -1
package/dist/auth.js +51 -1
package/dist/auth.js.map +1 -1
package/dist/banner.js +3 -2
package/dist/banner.js.map +1 -1
package/dist/checkpoints.d.ts +32 -0
package/dist/checkpoints.js +61 -0
package/dist/checkpoints.js.map +1 -0
package/dist/index.js +59 -4
package/dist/index.js.map +1 -1
package/dist/input.d.ts +61 -0
package/dist/input.js +574 -0
package/dist/input.js.map +1 -0
package/dist/profiles/index.js +5 -0
package/dist/profiles/index.js.map +1 -1
package/dist/profiles/methodologies/operate_domains.d.ts +8 -0
package/dist/profiles/methodologies/operate_domains.js +1239 -0
package/dist/profiles/methodologies/operate_domains.js.map +1 -0
package/dist/profiles/seeds.js +57 -36
package/dist/profiles/seeds.js.map +1 -1
package/dist/receipt.d.ts +17 -0
package/dist/receipt.js +46 -0
package/dist/receipt.js.map +1 -0
package/dist/render.d.ts +4 -1
package/dist/render.js +95 -28
package/dist/render.js.map +1 -1
package/dist/repl.d.ts +8 -1
package/dist/repl.js +420 -62
package/dist/repl.js.map +1 -1
package/dist/sessions.d.ts +14 -0
package/dist/sessions.js +100 -0
package/dist/sessions.js.map +1 -1
package/dist/ship.d.ts +2 -0
package/dist/ship.js +62 -0
package/dist/ship.js.map +1 -0
package/dist/skills/catalog.d.ts +13 -0
package/dist/skills/catalog.js +86 -0
package/dist/skills/catalog.js.map +1 -0
package/dist/tools/bash.js +81 -14
package/dist/tools/bash.js.map +1 -1
package/dist/tools/edit.js +21 -1
package/dist/tools/edit.js.map +1 -1
package/dist/tools/glob.js +4 -1
package/dist/tools/glob.js.map +1 -1
package/dist/tools/grep.d.ts +5 -0
package/dist/tools/grep.js +101 -2
package/dist/tools/grep.js.map +1 -1
package/dist/tools/index.d.ts +22 -0
package/dist/tools/index.js +177 -41
package/dist/tools/index.js.map +1 -1
package/dist/tools/ls.d.ts +3 -0
package/dist/tools/ls.js +23 -12
package/dist/tools/ls.js.map +1 -1
package/dist/tools/multiedit.d.ts +12 -0
package/dist/tools/multiedit.js +79 -0
package/dist/tools/multiedit.js.map +1 -0
package/dist/tools/stoa.d.ts +1 -1
package/dist/tools/stoa.js +7 -3
package/dist/tools/stoa.js.map +1 -1
package/dist/tools/task.d.ts +9 -0
package/dist/tools/task.js +166 -0
package/dist/tools/task.js.map +1 -0
package/dist/tools/todowrite.d.ts +12 -0
package/dist/tools/todowrite.js +38 -0
package/dist/tools/todowrite.js.map +1 -0
package/dist/tools/webfetch.d.ts +6 -0
package/dist/tools/webfetch.js +98 -0
package/dist/tools/webfetch.js.map +1 -0
package/dist/tools/websearch.d.ts +7 -0
package/dist/tools/websearch.js +83 -0
package/dist/tools/websearch.js.map +1 -0
package/dist/tools/write.js +17 -1
package/dist/tools/write.js.map +1 -1
package/dist/verifiers/confidence_marked.d.ts +2 -0
package/dist/verifiers/confidence_marked.js +49 -0
package/dist/verifiers/confidence_marked.js.map +1 -0
package/dist/verifiers/disclaimer_gate.d.ts +2 -0
package/dist/verifiers/disclaimer_gate.js +57 -0
package/dist/verifiers/disclaimer_gate.js.map +1 -0
package/dist/verifiers/index.d.ts +5 -0
package/dist/verifiers/index.js +20 -7
package/dist/verifiers/index.js.map +1 -1
package/dist/verifiers/lint.js +4 -3
package/dist/verifiers/lint.js.map +1 -1
package/dist/verifiers/promoted_kernels.d.ts +8 -0
package/dist/verifiers/promoted_kernels.js +190 -0
package/dist/verifiers/promoted_kernels.js.map +1 -0
package/dist/verifiers/source_gate.js +2 -3
package/dist/verifiers/source_gate.js.map +1 -1
package/dist/verifiers/test_smoke.js +30 -0
package/dist/verifiers/test_smoke.js.map +1 -1
package/dist/verifiers/types.d.ts +3 -0
package/package.json +4 -2
package/skills/README.md +123 -0
package/skills/ab-test.md +89 -0
package/skills/api-design.md +175 -0
package/skills/architecture-design.md +185 -0
package/skills/business-case.md +77 -0
package/skills/causal-inference.md +77 -0
package/skills/clinical-guideline.md +98 -0
package/skills/code-review.md +98 -0
package/skills/cold-outreach.md +268 -0
package/skills/competitive-teardown.md +223 -0
package/skills/component-spec.md +121 -0
package/skills/content-calendar.md +280 -0
package/skills/contract-review.md +155 -0
package/skills/data-analysis.md +187 -0
package/skills/debug.md +91 -0
package/skills/design-audit.md +121 -0
package/skills/differential-diagnosis.md +79 -0
package/skills/discovery-call.md +206 -0
package/skills/edit-pass.md +80 -0
package/skills/engineering-calc.md +101 -0
package/skills/estimate.md +70 -0
package/skills/experiment-design.md +105 -0
package/skills/fact-check.md +82 -0
package/skills/financial-model.md +104 -0
package/skills/grant-proposal.md +93 -0
package/skills/harmony-analysis.md +93 -0
package/skills/hypothesis-generation.md +99 -0
package/skills/incident-response.md +134 -0
package/skills/interview-loop.md +62 -0
package/skills/job-scorecard.md +92 -0
package/skills/kb-article.md +174 -0
package/skills/launch-plan.md +85 -0
package/skills/lease-review.md +93 -0
package/skills/lesson-plan.md +198 -0
package/skills/literature-review.md +69 -0
package/skills/market-entry.md +137 -0
package/skills/market-sizing.md +159 -0
package/skills/meta-analysis.md +140 -0
package/skills/migrate.md +117 -0
package/skills/optimize.md +88 -0
package/skills/options-strategy.md +166 -0
package/skills/peer-review.md +96 -0
package/skills/pentest-plan.md +193 -0
package/skills/pitch-review.md +132 -0
package/skills/plan.md +88 -0
package/skills/policy-brief.md +124 -0
package/skills/positioning.md +192 -0
package/skills/postmortem.md +168 -0
package/skills/prd.md +105 -0
package/skills/prioritize.md +162 -0
package/skills/proof.md +91 -0
package/skills/property-underwrite.md +159 -0
package/skills/recipe-develop.md +109 -0
package/skills/red-team.md +142 -0
package/skills/refactor.md +58 -0
package/skills/reflection-session.md +115 -0
package/skills/regulatory-compliance.md +136 -0
package/skills/reproduce.md +87 -0
package/skills/runbook.md +344 -0
package/skills/security-audit.md +154 -0
package/skills/seo-brief.md +201 -0
package/skills/sql-query.md +161 -0
package/skills/story-craft.md +163 -0
package/skills/tdd.md +59 -0
package/skills/term-sheet.md +298 -0
package/skills/theory-of-change.md +88 -0
package/skills/threat-model.md +104 -0
package/skills/ticket-triage.md +200 -0
package/skills/tolerance-analysis.md +149 -0
package/skills/training-program.md +151 -0
package/skills/translate.md +64 -0
package/skills/unit-economics.md +238 -0
package/skills/valuation.md +112 -0
package/skills/write-tests.md +77 -0

package/skills/market-sizing.md ADDED Viewed

@@ -0,0 +1,159 @@
+---
+name: market-sizing
+description: Size a market TAM/SAM/SOM via dual top-down and bottom-up triangulation, reconcile the gap, stress-test on the dominant driver, and produce a realistic obtainable-share view — invoke whenever a user asks to size a market, estimate addressable revenue, or validate a market opportunity for a product, segment, or geography.
+allowed-tools: Read, Bash, Write
+---
+KEY PRINCIPLE (read first): A market size is an argument, not a fact. Own every assumption, label every guess, and let the sensitivity table do the honest work. A point estimate without a derivation chain is not analysis — it is a number you cannot defend.
+═══ HARD RULES ═══
+1. NEVER present a number without its derivation chain — source or explicit "ASSUMED" label required.
+2. NEVER average the top-down and bottom-up figures to get a final number; reconcile by explaining the GAP.
+3. NEVER use a market report number as a fact — it is a data point with a methodology; state what that methodology was or label it "vendor estimate, methodology unverified."
+4. NEVER conflate TAM (universe) with SAM (serviceable) with SOM (obtainable) — use all three; collapsing them is a red flag.
+5. NEVER skip the sensitivity table — the dominant driver must be isolated and shocked at ±25% and ±50%.
+6. NEVER project SOM without a stated go-to-market constraint (sales capacity, channel reach, or budget ceiling). Wishful percentages of SAM are not SOM estimates.
+7. NEVER fabricate a CAGR or growth rate — use a range, state the source, or label it "ASSUMED — no source."
+8. NEVER cite a named company benchmark in Phase F without a source. If no comparable is cited, state "no comparable cited."
+9. Every stated assumption must be tagged with a unique label (e.g., [ACV-1], [SEG-2], [PRICE-3]). All calculations must cite the tag they depend on.
+10. If the market is supply-constrained (GPU compute, licensed spectrum, regulatory quotas), flag it before Phase B — demand-side TAM math is structurally misleading for supply-constrained markets.
+═══ PHASE A — FRAME THE MARKET ═══
+A1. STATE the product or service in one sentence: what it does, who buys it, and what it displaces or competes with.
+A2. CLASSIFY the category maturity — this determines which sizing method to trust more:
+    - Existing category (clear substitutes, active competitors): market reports exist; use them as sanity checks only, not as the answer.
+    - Nascent category (no direct comps): build entirely from bottom-up; any top-down number is a proxy from the closest analog category, labeled as such.
+    - Supply-constrained category (GPU clusters, licensed slots, regulated capacity): stop and note the supply ceiling before continuing — TAM is bounded by supply, not demand.
+A3. COMMIT to a market boundary — geography, customer segment, and channel — in writing before any calculation. If the product serves multiple segments, size each separately and aggregate at the end; never blend segments before sizing.
+A4. SET a sizing year explicitly ("2026 current state" or "2028 at scale"). All inputs must be consistent with that year.
+A5. LOG every assumption before you calculate. Use descriptive tags, not sequential letters that collide with phase labels:
+    - [PRICE-n] for price and ACV assumptions
+    - [SEG-n] for segment count and qualifier assumptions
+    - [PENET-n] for penetration and conversion assumptions
+    - [BENCH-n] for benchmark comparables
+    - [MACRO-n] for macro pool figures pulled from external sources
+    Every tag must appear at least once in a downstream calculation.
+═══ PHASE B — TOP-DOWN SIZING ═══
+B1. IDENTIFY the widest credible macro pool for your category.
+    - Prefer public data: census/labor statistics, government trade data, industry association filings.
+    - Label any private analyst figure (Gartner, IDC, Forrester, Statista) as [MACRO-n: vendor estimate, methodology unverified].
+    - Do not use a market report's TAM number as your macro pool starting point — that number is the answer someone else computed, not raw data.
+B2. APPLY segment filters from the macro pool down to your target customer, one filter per step:
+    - Geography filter: apply population share, GDP share, or penetration ratio with source. [MACRO-n]
+    - Vertical/industry filter: use SIC/NAICS employment counts or revenue share as the denominator. [SEG-n]
+    - Buying unit filter: count the qualified buying units (companies of the right headcount band, households meeting income/behavior criteria, etc.). State how you counted them and whether sources overlap.
+B3. COMPUTE: (# of qualified buying units) × (average annual spend per unit) = TOP-DOWN TAM.
+    - If spend-per-unit is unknown, use a price proxy from a comparable product category. Label it [PRICE-n: ASSUMED — derived from {analog product} pricing in {year}].
+    - Do not use the analyst report's implied ARPU to back-calculate TAM; that is circular.
+B4. DERIVE TOP-DOWN SAM: apply one constraint that reflects what your product can actually reach today — the geography you can serve, or the customer segment your current feature set fits.
+B5. RECORD the top-down TAM and SAM with every assumption tag referenced inline.
+═══ PHASE C — BOTTOM-UP SIZING ═══
+C1. DEFINE the atomic purchase unit — what is one transaction? (seat/month, transaction fee, one-time license, API call batch.) This is the irreducible unit your revenue model rests on.
+C2. COUNT the universe of buyers from the ground up using at least two independent sources (e.g., LinkedIn company counts cross-checked against a government business register or Dun & Bradstreet). State explicitly whether you deduplicated across sources and what the overlap risk is.
+C3. APPLY a two-stage qualification filter — these are multiplicative, applied in sequence, not blended:
+    - Stage 1: What fraction of the counted universe has the problem your product solves? [PENET-1: ASSUMED or cited]
+    - Stage 2: Of those with the problem, what fraction would pay for a solution vs. DIY or use a free alternative? [PENET-2: ASSUMED or cited]
+    - Qualified buyers = (universe) × [PENET-1] × [PENET-2]. Collapsing into a single percentage obscures which filter is the binding one.
+C4. COMPUTE: (qualified buyers) × (ACV per year) = BOTTOM-UP TAM.
+    - Decompose ACV: (units per buyer per year) × (unit price). [PRICE-n] [SEG-n]
+    - Do not use a blended industry ACV from a report — build it from your pricing model or a named comparable with a source.
+C5. DERIVE BOTTOM-UP SAM by constraining to the segment your product currently fits: feature completeness, required integrations, supported languages, applicable compliance certifications.
+C6. RECORD the bottom-up TAM and SAM with every assumption tag referenced inline.
+═══ PHASE D — RECONCILE THE GAP ═══
+D1. COMPUTE the ratio: TOP-DOWN TAM ÷ BOTTOM-UP TAM. These are heuristic thresholds, not industry standards:
+    - Ratio < 1.5×: methods broadly agree; minor assumption differences; proceed.
+    - Ratio 1.5×–5×: material gap; must be diagnosed and explained before the output is credible.
+    - Ratio > 5×: at least one method has a structural flaw; diagnose and rerun that method before proceeding.
+D2. DIAGNOSE the gap source — it is almost always one of the following:
+    - Double-counting in top-down: the macro pool included adjacent categories or non-customers.
+    - Under-counting in bottom-up: a buyer segment was missed, or ACV was set too low relative to actual deal sizes.
+    - Pricing mismatch: top-down used an industry blended average while bottom-up used your specific price point.
+    - Definition mismatch: the TAM boundary (geography, segment) was not identical between the two methods.
+    - Competitive displacement lag: bottom-up counted only available share; top-down counted the full market before competitive lock-in.
+D3. PRODUCE a reconciled TAM and reconciled SAM: state which number you adopt as the working estimate, and give the specific reason why (e.g., "bottom-up is binding because the macro pool double-counts indirect spend").
+D4. If the gap cannot be explained, present a RANGE — (low = bottom-up, high = top-down) — and state it explicitly as unresolved uncertainty. Do not pick a midpoint without a justification grounded in a specific assumption difference.
+═══ PHASE E — SENSITIVITY ANALYSIS ═══
+E1. IDENTIFY the dominant driver: the single assumption whose realistic shock (±25%) produces the largest absolute change in TAM or SAM. This is a tornado-chart ranking, not a mathematical ±1% elasticity — rank by impact magnitude at realistic shock magnitudes, not by mathematical slope.
+    Common dominant drivers: ACV/unit price, number of qualifying buyers, penetration rate, market growth rate.
+E2. IDENTIFY the second-order driver: the next-highest-impact assumption after the dominant one.
+E3. BUILD the sensitivity table — shock the dominant driver at −50%, −25%, base, +25%, +50%:
+    | Dominant Driver Value | −50% | −25% | Base | +25% | +50% |
+    |-----------------------|------|------|------|------|------|
+    | TAM ($)               |      |      |      |      |      |
+    | SAM ($)               |      |      |      |      |      |
+E4. STATE the "plausible bear case": dominant driver at −25% AND second-order driver at −25%.
+    STATE the "plausible bull case": dominant driver at +25% AND second-order driver at +25%.
+    Do not compound both at ±50% — that is a stress test, not a planning scenario.
+E5. The sensitivity table is the output. A point estimate without it is inadmissible. If the range between bear and bull cases is less than 2×, the model is likely over-constrained — check for circular assumptions.
+═══ PHASE F — OBTAINABLE SHARE (SOM) ═══
+F1. CHOOSE a time horizon: Year 1, Year 3, Year 5. Size SOM for each separately. Do not interpolate Year 3 from Year 5 — the binding constraint changes as the company scales.
+F2. GROUND the SOM estimate in a go-to-market constraint, not a share percentage of SAM:
+    - Sales capacity path: (# of reps or founders) × (quota per rep) × (close rate) = max deals/year → SOM ceiling via this path. [BENCH-n if close rate is from a comparable]
+    - Channel path: (channel partner reach) × (historical conversion rate for this channel type) = max customers → SOM ceiling via this path.
+    - Budget path: (marketing + sales budget) ÷ (blended CAC estimate) = max customers → SOM ceiling via this path.
+    Use the LOWEST output across all paths as the binding SOM ceiling. If only one path is available, state that.
+F3. BENCHMARK against comparable go-to-market motions ONLY IF you have a named, sourced comparable:
+    [BENCH-n: {Company}, {Year}, {GTM motion}, {ACV range}, {Y1/Y3 share achieved}, {source}]
+    If no comparable is available, state "no comparable cited" — do not invent one.
+    Flag as aggressive if Year 3 SOM/SAM > 10% in a market with two or more established competitors.
+F4. COMPUTE SOM: (binding constraint output) × (average deal size). Express as both a dollar figure and a percentage of SAM.
+F5. STATE the key risk that would compress SOM below the binding constraint — at minimum: competitive displacement rate (what is the switching cost for customers locked to an incumbent?), churn rate in Year 1, and any regulatory or procurement-cycle delay specific to this market.
+═══ PHASE G — DELIVER THE OUTPUT ═══
+G1. PRODUCE a one-page summary table:
+    | Metric               | Method            | Figure   | Assumption Tags  |
+    |----------------------|-------------------|----------|------------------|
+    | TAM (top-down)       | Macro filter      | $X B     | MACRO-1, SEG-2   |
+    | TAM (bottom-up)      | Unit build        | $X B     | PENET-1, PRICE-1 |
+    | TAM (reconciled)     | Working estimate  | $X B     | D3 rationale     |
+    | SAM                  | Serviceable slice | $X B     | SEG-n boundary   |
+    | SOM — Year 1         | GTM constraint    | $X M     | F2 binding path  |
+    | SOM — Year 3         | GTM constraint    | $X M     | F2 binding path  |
+    | Bear case TAM        | Sensitivity       | $X B     | E4               |
+    | Bull case TAM        | Sensitivity       | $X B     | E4               |
+G2. PRODUCE an ASSUMPTION LOG block listing every tagged assumption in full:
+    [TAG]: {assumption text} | {source or "ASSUMED"} | {used in: phase references}
+G3. LIST the top three assumptions you would revise with one additional data point, and state exactly what data point would change each one. This is the research priority list — be specific about what to look up, not generic about "more data."
+G4. Present the output as a structured estimate with explicit uncertainty bounds. Never present it as a forecast or a projection. The correct framing is: "Under these assumptions, the market is approximately $X B (bear: $Y B, bull: $Z B)."

package/skills/meta-analysis.md ADDED Viewed

@@ -0,0 +1,140 @@
+---
+name: meta-analysis
+description: Pool studies rigorously — common effect-size metric, fixed/random-effects choice, heterogeneity (I²/τ²), publication-bias checks, sensitivity, and GRADE certainty.
+allowed-tools: Read, Bash, WebSearch, WebFetch, Write
+---
+## PHASE 0 — SCOPE AND REGISTRATION
+1. Frame the PICO: Population, Intervention/Exposure, Comparator, Outcome. One primary outcome; pre-specify secondary outcomes and subgroups BEFORE searching.
+2. Write an eligibility table: inclusion criteria (design, population, comparator, minimum sample size, language, date range) and exclusion criteria. Lock it. Never revise post-hoc.
+3. Register the protocol (PROSPERO, OSF, or a timestamped commit) before screening begins. If post-hoc, state it prominently as a limitation.
+## PHASE 1 — SYSTEMATIC SEARCH + PRISMA FLOW
+4. Search at minimum: PubMed/MEDLINE, Embase, CENTRAL (Cochrane), and one domain-specific database. Add grey literature (ClinicalTrials.gov, WHO ICTRP, dissertations) to reduce publication bias.
+5. Build a reproducible query string per database (Boolean operators, MeSH/Emtree terms, free text). Log exact query + run date. Rerun if >12 months elapses.
+6. Export all records to a deduplication tool (Rayyan, Covidence, or a dedupe script). Remove duplicates before screening.
+7. Two reviewers screen titles/abstracts independently; resolve conflicts by consensus or third reviewer. Repeat for full-text.
+8. Produce a PRISMA 2020 flow diagram: records identified → duplicates removed → screened → excluded (with reasons) → included. Report counts at each node.
+## PHASE 2 — DATA EXTRACTION
+9. Use a pre-piloted extraction form: study ID, design, N, population descriptors, intervention/exposure, comparator, outcome definition, measurement instrument, time point, reported statistic, and all data needed for effect-size conversion.
+10. Extract in duplicate. Calculate inter-rater agreement (Cohen's κ ≥ 0.80 is acceptable; resolve all discrepancies before pooling).
+11. When reported statistics differ, convert to one common metric (see Phase 3). Never pool raw means with ORs.
+12. Extract variance (SE, SD, CI width, or p-value + N). If not reported: impute SD from similar studies only as a last resort and flag as sensitivity analysis.
+## PHASE 3 — EFFECT-SIZE CONVERSION (common metric required)
+13. Choose ONE metric for the primary analysis. Canonical options:
+    - Continuous outcomes: Hedges' g (SMD corrected for small-sample bias, preferred) or Cohen's d.
+    - Binary outcomes: log(OR) or log(RR) — log scale for pooling, exponentiate for reporting.
+    - Correlational: Fisher's z (r → z = 0.5·ln((1+r)/(1-r))) for pooling; back-transform for reporting.
+14. Conversion formulas (Bash-compute when numbers are present):
+    - t-statistic → d: d = 2t/√(df)
+    - F(1,df) → d: d = √(F/((N1·N2)/(N1+N2)))
+    - OR → d (log-OR/π×√3): d = log(OR)·(√3/π)
+    - r → d: d = 2r/√(1−r²)
+    - Hedges' g: g = d·J where J = 1 − 3/(4·df−1)
+    - Variance of g: v_g = (n1+n2)/(n1·n2) + g²/(2·(n1+n2))
+15. Compute in Bash when study data are supplied. Example:
+    ```bash
+    python3 - <<'EOF'
+    import math
+    studies = [
+        {"id":"S1","g":0.42,"v":0.05},
+        {"id":"S2","g":0.61,"v":0.03},
+        {"id":"S3","g":0.28,"v":0.08},
+    ]
+    # Inverse-variance weights
+    for s in studies:
+        s["w"] = 1/s["v"]
+    W = sum(s["w"] for s in studies)
+    pooled = sum(s["w"]*s["g"] for s in studies)/W
+    pooled_var = 1/W
+    pooled_se = math.sqrt(pooled_var)
+    z = pooled/pooled_se
+    import scipy.stats as st
+    ci_lo = pooled - 1.96*pooled_se
+    ci_hi = pooled + 1.96*pooled_se
+    # Cochran Q
+    Q = sum(s["w"]*(s["g"]-pooled)**2 for s in studies)
+    k = len(studies)
+    p_Q = 1 - st.chi2.cdf(Q, df=k-1)
+    I2 = max(0,(Q-(k-1))/Q)*100
+    # tau^2 (DerSimonian-Laird)
+    C = W - sum(s["w"]**2 for s in studies)/W
+    tau2 = max(0,(Q-(k-1))/C)
+    print(f"Pooled g = {pooled:.3f} [{ci_lo:.3f}, {ci_hi:.3f}]")
+    print(f"Q={Q:.2f} (p={p_Q:.3f}), I²={I2:.1f}%, τ²={tau2:.4f}")
+    EOF
+    ```
+16. HARD RULE: never pool studies with incommensurable outcomes (e.g., mortality vs symptom score) even if the same label is used. Verify instrument equivalence first.
+## PHASE 4 — FIXED VS RANDOM EFFECTS (justify the choice)
+17. Fixed-effects (common-effect) model: assumes all studies estimate the same true effect; appropriate only when studies are near-identical in population, intervention, and outcome AND k is small. Rare in practice.
+18. Random-effects (DerSimonian-Laird or REML): assume a distribution of true effects; appropriate when studies differ in design, population, dose, or setting — i.e., almost always.
+19. Decision rule: If Q p < 0.10 OR I² > 50% OR prior clinical knowledge predicts heterogeneity → random-effects. Document the rationale.
+20. For small k (< 5 studies), DL underestimates τ²; prefer REML or Paule-Mandel. Report which estimator was used.
+## PHASE 5 — POOLING AND FOREST PLOT
+21. Pool with inverse-variance weighting: w_i = 1/(v_i + τ²) for random-effects; w_i = 1/v_i for fixed.
+22. Forest plot requirements: one row per study (ID, N, effect size + 95% CI, weight %), diamond for pooled estimate, vertical null line, heterogeneity stats (Q, I², τ²), and the model type. Sort by year or effect size — be consistent.
+23. Report: pooled estimate, 95% CI, prediction interval (95% PI = pooled ± 1.96·√(τ²+pooled_var)) for random-effects. PI answers "where will the next study's effect likely fall" — often wider than CI.
+24. If k ≥ 10, also report 95% PI. If PI crosses the null, the evidence is weaker than the CI alone implies — say so.
+## PHASE 6 — HETEROGENEITY QUANTIFICATION AND EXPLORATION
+25. Report Q (Cochran), p(Q), I², and τ² (with 95% CI via Q-profile or bootstrap). I² benchmarks: 25% low, 50% moderate, 75% high — but they are context-dependent, not thresholds.
+26. Pre-specified subgroup analysis: split studies by a categorical moderator (e.g., RCT vs observational, age group, dose level). Test between-subgroup heterogeneity with Q_between. Do NOT use subgroups to explain away heterogeneity post-hoc.
+27. Meta-regression: for continuous moderators (year, mean age, dose). Use weighted least squares (WLS) with w_i as above. Report slope, 95% CI, R²_analog (proportion of τ² explained). Require k ≥ 10 per covariate; more is better.
+28. Limit exploratory moderators to ≤ k/10 to avoid over-fitting. Flag any post-hoc moderator clearly.
+## PHASE 7 — PUBLICATION BIAS
+29. Funnel plot: SE on y-axis (inverted), effect size on x-axis. Asymmetry suggests small-study effects / publication bias — but asymmetry has many causes (report all plausible ones).
+30. Egger's test (linear regression of effect/SE on 1/SE): significant intercept (p < 0.10) flags asymmetry. Requires k ≥ 10; underpowered below that — state the limitation.
+31. Trim-and-fill (Duval-Tweedie): impute missing studies to restore funnel symmetry; re-pool with imputed studies. Report the adjusted estimate. This is a sensitivity check, not a correction.
+32. Contour-enhanced funnel plot: overlay significance contours (p = 0.05, 0.01) to distinguish bias from heterogeneity.
+33. If k < 10, skip formal bias tests (insufficient power); note this explicitly. Use clinical judgment and grey-literature search as bias controls instead.
+## PHASE 8 — SENSITIVITY AND ROBUSTNESS
+34. Leave-one-out analysis: re-pool omitting each study in turn. If removing one study flips the conclusion, the result is fragile — report it.
+35. Influence diagnostics: Cook's distance, DFFITS, or Baujat plot to identify outlying / high-leverage studies.
+36. Compare fixed vs random-effects estimates. Large divergence signals heterogeneity driving the result.
+37. Restrict to high-quality studies (e.g., low risk of bias only). If the estimate changes substantially, quality is a moderator — discuss.
+38. If imputed SDs or converted effect sizes exist, run the analysis with and without those studies.
+## PHASE 9 — RISK OF BIAS WITHIN STUDIES
+39. Use the appropriate tool for each design: RoB 2 (RCTs), ROBINS-I (non-randomized), QUADAS-2 (diagnostic), NOS (observational). Apply in duplicate.
+40. Produce a risk-of-bias summary table and figure. Do not average or collapse domains into a single score — report domain-level judgments.
+41. Incorporate RoB into GRADE (see Phase 10): high RoB across studies downgrades certainty.
+## PHASE 10 — GRADE CERTAINTY OF EVIDENCE
+42. Assign a starting certainty level: RCTs = High; observational = Low.
+43. Downgrade for: risk of bias, inconsistency (high I²/unexplained heterogeneity), indirectness (population/intervention/outcome mismatch), imprecision (wide CI, small k or N), publication bias (likely present).
+44. Upgrade for: large magnitude (OR > 5), dose-response gradient, all plausible confounders would attenuate the effect.
+45. Final certainty: High / Moderate / Low / Very Low. One upgrade or downgrade per domain; document each decision.
+46. Summarize in a GRADE Summary of Findings table: outcome, N studies, N participants, pooled estimate + CI, and certainty rating.
+## PHASE 11 — REPORTING STANDARDS
+47. Follow PRISMA 2020 checklist (27 items). Confirm each item before submission.
+48. Report: protocol registration, full search strings, PRISMA flow, extraction agreement, all effect sizes and variances, model justification, heterogeneity statistics, bias assessment, sensitivity results, GRADE table, forest plot, funnel plot (if k ≥ 10), and prediction interval.
+49. Honest limitations section must address: k and total N, risk of bias distribution, language/database restrictions, inability to obtain individual participant data (IPD), and any post-hoc analyses.
+## HARD RULES (never violate)
+- NEVER pool incommensurable outcomes. Verify metric equivalence before computing line 1.
+- NEVER report only I² without τ² — I² is scale-dependent and misleading alone.
+- NEVER interpret a non-significant Egger's test as "no publication bias" — it is underpowered.
+- NEVER perform subgroup analyses not pre-specified without labeling them exploratory.
+- NEVER upgrade GRADE certainty without a documented, domain-specific rationale.
+- NEVER omit the prediction interval when k ≥ 5 and random-effects are used.
+- ALL Bash computations must print study IDs alongside estimates for auditability.

package/skills/migrate.md ADDED Viewed

@@ -0,0 +1,117 @@
+---
+name: migrate
+description: Migrate a codebase safely — inventory all call sites first, use expand-migrate-contract, change in small reversible batches keeping the build green, verify semantics preserved.
+allowed-tools: Read, Edit, MultiEdit, Bash, Grep, Glob
+---
+## PHASE 0 — DEFINE (non-negotiable before any edit)
+1. State the migration contract in one sentence: "FROM `old.doThing(x, y)` TO `newLib.thing({ x, y })` — same semantics, new import path + call shape." Write it down. If you cannot write it in one sentence, the scope is unclear — clarify first.
+2. State the SUCCESS CRITERION explicitly: what must be true when done? (e.g., "zero imports of `old-lib`, all tests green, no TypeScript errors, bundle size ≤ before".) This is the acceptance gate.
+3. Define what is OUT OF SCOPE. Migrations attract opportunistic cleanup. Any unrelated change goes into a separate commit or a follow-up ticket — never mixed in.
+## PHASE 1 — INVENTORY (know the blast radius before touching anything)
+4. Find every import/require of the old API:
+   ```
+   grep -rn "from ['\"]old-lib['\"]" --include="*.ts" --include="*.tsx" --include="*.js" .
+   grep -rn "require(['\"]old-lib" --include="*.ts" --include="*.js" .
+   ```
+5. Find every call site of the specific symbol(s) being replaced:
+   ```
+   grep -rn "oldFunction\|OldClass\|OLD_CONSTANT" --include="*.ts" --include="*.tsx" .
+   ```
+6. Check indirect references — re-exports, barrel files, dynamic access:
+   ```
+   grep -rn "export.*oldFunction\|oldFunction['\"]" --include="*.ts" .
+   grep -rn "\[.*'oldFunction'.*\]\|\['.*'\]" --include="*.ts" .
+   ```
+7. Check non-TS surfaces: config files, shell scripts, CI YAML, Dockerfiles, docs:
+   ```
+   grep -rn "old-lib\|oldFunction" --include="*.json" --include="*.yaml" --include="*.sh" --include="*.md" .
+   ```
+8. Check generated code and lockfiles that embed the old API (codegen output dirs, `*.generated.ts`).
+9. Record the full inventory. Format: `path/to/file.ts — N call sites — [mechanical|tricky]`. Label each file: **mechanical** (pure find-replace, no logic change) or **tricky** (conditional branching on the API, overloaded usage, dynamic dispatch, tests that mock internals).
+10. Count totals. If blast radius > 50 files, plan explicit batches now (group by layer: utils → services → routes → tests).
+## PHASE 2 — PREPARE
+11. Capture baseline: run the full build + tests, record pass/fail counts and any pre-existing failures. Do not let pre-existing failures mask new breakage.
+    ```
+    npx tsc --noEmit 2>&1 | tail -5
+    npm test 2>&1 | tail -20
+    ```
+12. Install / configure the new dependency if needed. Commit this step alone: `chore(deps): add new-lib@x.y.z`. Do not mix dependency changes with code changes.
+13. Write or identify tests that exercise the behavior being migrated — these are your semantic anchors. If tests are thin, write characterization tests now against the OLD API, commit them, and confirm they are green. They must stay green throughout (modulo expected import-path changes after the contract step).
+14. Set up the codemod for mechanical changes (jscodeshift / sed / ts-morph script). Dry-run it first — pipe to a temp file and diff:
+    ```
+    npx jscodeshift -t codemod.ts src/ --dry --print 2>&1 | head -60
+    ```
+    Never run a codemod on the whole tree in one shot without a dry-run diff review.
+## PHASE 3 — EXPAND-MIGRATE-CONTRACT
+The cardinal rule: **the build must stay green at every commit.**
+### 3a — EXPAND (add the new path alongside the old)
+15. Introduce the new API/adapter/shim without touching any callers. If it is a new library, add a thin compatibility wrapper that makes both old and new signatures coexist. Commit: `feat(migrate): introduce new-lib adapter alongside old-lib`.
+16. Typecheck. Tests green. If anything breaks here, fix before proceeding — you have not changed any caller yet.
+### 3b — MIGRATE (move callers batch by batch)
+17. Migrate mechanical files first using the codemod — one logical group (e.g., one directory or layer) per commit. Never the whole tree in one commit.
+18. After each batch:
+    a. `npx tsc --noEmit 2>&1 | head -30` — zero new errors.
+    b. `npm test -- --testPathPattern=<changed-dir> 2>&1 | tail -20` — green.
+    c. Commit: `refactor(migrate): migrate <dir> from old-lib to new-lib (N files)`.
+19. Migrate tricky files by hand. Read the old call site, understand the semantics, rewrite — do not blindly apply the codemod. Add an inline comment if the new call is non-obvious. One file or one cluster per commit.
+20. Maintain a checklist: after each batch, mark files DONE. Keep remaining files visible. Example tracking comment at top of your scratch list:
+    ```
+    [x] src/utils/http.ts
+    [x] src/services/auth.ts
+    [ ] src/routes/api.ts   ← tricky: mocks old internals
+    [ ] tests/integration/  ← update last
+    ```
+21. Migrate tests last — only after all production code is migrated and green. Updating tests simultaneously with source obscures which side introduced a failure.
+### 3c — CONTRACT (remove the old)
+22. Only after 100% of callers are migrated and all tests green: delete the old import, old shim, old adapter, old re-exports.
+23. Remove the old dependency from package.json if it has no other consumers.
+24. Typecheck with zero tolerance. Any remaining reference to the old API is a miss — go back to inventory.
+25. Commit: `chore(migrate): remove old-lib — migration complete`.
+## PHASE 4 — VERIFICATION
+26. Run the FULL test suite (not the targeted subset): `npm test 2>&1 | tail -30`. Pass count must be >= baseline; no new failures allowed.
+27. Full typecheck: `npx tsc --noEmit 2>&1` — zero errors.
+28. Bundle / build check if applicable: compare sizes, check for accidental dual-bundling of old + new lib.
+29. Semantic diff review — read the aggregate diff across the migration:
+    ```
+    git diff <pre-migration-sha>..HEAD -- src/
+    ```
+    For every changed call site ask: "Did any default value, error path, argument order, return shape, or side effect change?" Flag any divergence as a separate `fix:` commit — never silently inside the migration.
+30. Re-run the grep from Phase 1 to confirm zero remaining references to the old API:
+    ```
+    grep -rn "old-lib\|oldFunction" --include="*.ts" --include="*.tsx" --include="*.js" .
+    ```
+    Expected result: zero matches (or only the codemod/migration script itself, if kept).
+31. Check non-TS surfaces (configs, docs, CI) one more time — same grep as step 7.
+## PHASE 5 — ROLLBACK PLAN
+32. Every batch commit is independently revertible: `git revert <sha>`. The expand-migrate-contract pattern means you can revert the contract step and re-expose the old API with a single commit revert.
+33. If a tricky call site reveals a semantic mismatch mid-migration, stop the batch, revert that file, document the blocker, and continue with other callers. Do not paper over a semantic difference with a shim that silently changes behavior.
+34. If the migration must be abandoned mid-flight, the expand step ensures the old API is still present — callers already migrated are safe on the new API; un-migrated callers still use the old. The codebase is always in a buildable state.
+## HARD RULES
+- **Inventory before first edit.** Never touch a call site before you have the full list. Surprises mid-migration cause half-applied states.
+- **Build green at every commit.** A red commit is never acceptable, even as "WIP." Red history makes bisect useless.
+- **One concern per commit.** Dependency bump, adapter introduction, mechanical batch, tricky hand-edit, old API removal — each is a separate commit. Never mix migration with unrelated refactors or feature changes.
+- **Semantic anchor tests must stay green.** If a test breaks, that is a signal of a semantic divergence — treat it as a blocker, not a test to update.
+- **Mechanical vs tricky are different risk levels.** Run the codemod on mechanical files; hand-edit tricky ones. Misclassification is the #1 source of silent semantic drift.
+- **Never batch the entire tree in one commit.** No matter how mechanical the change, keep batches small enough to read the diff in one sitting (~10-20 files max per commit).
+- **Do not let the migration creep.** If you notice a related smell while migrating a file, note it, finish the migration, then address the smell in a follow-up commit.
+- **Document the migration in the PR.** State: what changed, what was automated vs hand-edited, how semantics were verified, what the rollback path is.

package/skills/optimize.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+name: optimize
+description: Performance optimization — measure and profile first, fix the real bottleneck (algorithmic before micro), re-measure after each change, keep results correct.
+allowed-tools: Read, Edit, Bash, Grep, Glob
+---
+## Phase 0 — Define the Target (non-negotiable before touching code)
+1. State the workload precisely: which code path, which inputs, what load (RPS, payload size, concurrency).
+2. Name one primary metric and a hard target: e.g. "p99 < 50 ms at 500 RPS" or "< 200 MB RSS under 10k items" or "> 10k inserts/s".
+3. Name secondary guard-rails: correctness (tests must stay green), p50, error rate, memory ceiling.
+4. Write the target down before running a single benchmark. Optimization without a target is refactoring.
+## Phase 1 — Establish a Reproducible Baseline
+5. Pin the environment: lock CPU frequency if possible, kill background noise, use the same machine/container for every run.
+6. Write a benchmark that drives the exact hot path — not a micro-benchmark of a helper, the real call stack.
+   - Node/TS: use `hyperfine`, `autocannon`, or `vitest bench`; for in-process: `perf_hooks.performance.now()` around a tight loop (≥ 1 000 iterations, discard first 10% as warm-up).
+   - Python: `pytest-benchmark`, `timeit`, or `locust` for HTTP.
+   - Shell timing: `hyperfine --warmup 5 --runs 20 '<cmd>'`.
+7. Record the baseline number with units, timestamp, and git SHA. Example:
+   ```
+   baseline: p50=142ms p99=380ms  @500RPS  sha=abc1234  2026-06-27
+   ```
+8. Run the full test suite now. All green = correctness baseline locked.
+## Phase 2 — Profile to Find the Real Hot Path (never guess)
+9. Attach a profiler to the benchmark workload — NOT to a toy script.
+   - Node: `node --prof server.js` → `node --prof-process isolate-*.log > profile.txt`; or `clinic flame -- node server.js`.
+   - Python: `python -m cProfile -o out.prof script.py && python -m pstats out.prof`.
+   - Native/Go: `perf record -g ./bin && perf report`.
+10. Read the flame graph top-down. The widest frame is the bottleneck. Ignore everything < 5% of wall time.
+11. Identify the bottleneck category before writing any fix:
+    - **Algorithmic** (O(n²), full-table scan, redundant traversal) — fix this first, always.
+    - **I/O bound** (serial awaits, N+1 queries, missing index) — batch or parallelize.
+    - **Memory/GC** (high allocation rate, large retained objects) — reduce allocations or pool.
+    - **CPU bound after algo fix** (tight loop, serialization) — then micro-optimize or parallelize.
+## Phase 3 — Attack the Biggest Bottleneck First
+12. **Algorithmic wins first.** O(n log n) sort beats O(n²) loop by 1000x at n=10k. No constant-factor trick competes.
+    - Replace nested loops over the same collection with a Map/Set lookup.
+    - Replace repeated `.filter().find()` chains with a single indexed Map built once.
+    - Replace synchronous sequential I/O with `Promise.all` or a connection pool.
+13. **Caching / memoization.** Cache the result of pure or slow functions keyed on inputs. Use `Map` for in-process; Redis for cross-process. Always set a TTL or max-size — unbounded caches become memory leaks.
+14. **Batching.** Replace N individual DB/network calls with one batched call. Use DataLoader pattern for per-request batching. Batch size sweet spot: profile it (often 50–500 rows).
+15. **N+1 elimination.** In ORMs/query layers: eager-load relations in one query; never query inside a loop.
+16. **Better data structures.** Use a `Set` for membership tests (O(1) vs O(n) array). Use a sorted array + binary search for range queries. Use a `Map` keyed by ID instead of `.find()` on an array.
+17. **Lazy / streaming.** Don't load 100k rows into memory to process 10. Use cursors, generators, or streaming transforms. In Node: `Readable.from()`, async generators, stream pipelines.
+18. **Parallelism.** CPU-bound work: worker_threads (Node) or multiprocessing (Python). I/O-bound: concurrency limit with `p-limit` or semaphore (avoid unlimited `Promise.all` — it causes thundering herd).
+19. **Reduce allocations.** In hot loops: reuse buffers, avoid string concatenation (use arrays + `.join`), avoid object spread in tight paths. Profile GC pause time separately (`--expose-gc` + `gc()` + `process.memoryUsage()`).
+20. **Serialization.** JSON.parse/stringify on large payloads is expensive. Consider `fast-json-stringify`, MessagePack, or schema-aware codecs. Profile before switching — overhead is payload-size dependent.
+## Phase 4 — Re-Measure After EVERY Change
+21. After each fix: re-run the exact same benchmark. Record the new number with the git SHA of the fix.
+22. Re-run the full test suite. A failing test means the optimization changed behavior — revert or fix correctness first.
+23. Compare against baseline AND the previous measurement to isolate the contribution of each change.
+24. If the metric did not improve by at least 10%, consider reverting — complexity has a cost.
+## Phase 5 — Watch for Regressions
+25. Check all guard-rail metrics (p50, memory, error rate) after each change — a p99 win that doubles memory is not a win.
+26. Run the benchmark at 2x the target load to check for cliff edges (sudden latency spikes, OOM).
+27. If optimization shifts the hot path to a new bottleneck, profile again (Phase 2) before touching anything else.
+## Phase 6 — Stop When the Target Is Hit
+28. When the primary metric clears the target defined in Phase 0, stop. Over-optimizing past the target trades maintainability for no user benefit.
+29. Document the win in a comment or commit message with before/after numbers, the bottleneck type, and the fix applied. Example:
+    ```
+    perf: replace O(n²) agent lookup with Map index
+    Before: p99=380ms  After: p99=31ms  (-92%)  sha range abc1234..def5678
+    Bottleneck: agents.find() inside per-event loop (N×M = 500k iterations)
+    Fix: build Map<id,agent> once before loop, O(1) lookup per event
+    ```
+30. Add the benchmark to CI as a regression guard with a threshold (e.g. fail if p99 > 60ms).
+## Hard Rules
+- NEVER optimize without a profile showing the line is hot. Gut-feel micro-opts waste time and add bugs.
+- ALWAYS keep tests green. Correctness is not negotiable.
+- ONE change per measurement cycle. Mixing two changes makes it impossible to know which helped.
+- Algorithmic complexity before constant-factor tricks, always.
+- Caches MUST have eviction. Unbounded = memory leak.
+- Parallelism MUST have a concurrency cap. Unlimited = thundering herd or OOM.
+- Document every win with numbers. "It felt faster" is not a result.