npm - @vextlabs/theron-cli - Versions diffs - 0.2.1 → 0.4.0 - Mend

@vextlabs/theron-cli 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (191) hide show

package/dist/api.d.ts +8 -0
package/dist/api.js +3 -0
package/dist/api.js.map +1 -1
package/dist/auth.js +51 -1
package/dist/auth.js.map +1 -1
package/dist/banner.js +3 -2
package/dist/banner.js.map +1 -1
package/dist/checkpoints.d.ts +32 -0
package/dist/checkpoints.js +61 -0
package/dist/checkpoints.js.map +1 -0
package/dist/index.js +61 -5
package/dist/index.js.map +1 -1
package/dist/input.d.ts +61 -0
package/dist/input.js +574 -0
package/dist/input.js.map +1 -0
package/dist/profiles/index.js +5 -0
package/dist/profiles/index.js.map +1 -1
package/dist/profiles/methodologies/build_domains.d.ts +6 -0
package/dist/profiles/methodologies/build_domains.js +170 -0
package/dist/profiles/methodologies/build_domains.js.map +1 -0
package/dist/profiles/methodologies/operate_domains.d.ts +8 -0
package/dist/profiles/methodologies/operate_domains.js +1239 -0
package/dist/profiles/methodologies/operate_domains.js.map +1 -0
package/dist/profiles/methodologies/regulated_domains.d.ts +6 -0
package/dist/profiles/methodologies/regulated_domains.js +153 -0
package/dist/profiles/methodologies/regulated_domains.js.map +1 -0
package/dist/profiles/methodologies/research_domains.d.ts +8 -0
package/dist/profiles/methodologies/research_domains.js +179 -0
package/dist/profiles/methodologies/research_domains.js.map +1 -0
package/dist/profiles/methodologies/strategy_domains.d.ts +15 -0
package/dist/profiles/methodologies/strategy_domains.js +193 -0
package/dist/profiles/methodologies/strategy_domains.js.map +1 -0
package/dist/profiles/seeds.js +241 -95
package/dist/profiles/seeds.js.map +1 -1
package/dist/receipt.d.ts +17 -0
package/dist/receipt.js +46 -0
package/dist/receipt.js.map +1 -0
package/dist/render.d.ts +4 -1
package/dist/render.js +95 -28
package/dist/render.js.map +1 -1
package/dist/repl.d.ts +8 -1
package/dist/repl.js +420 -62
package/dist/repl.js.map +1 -1
package/dist/sessions.d.ts +14 -0
package/dist/sessions.js +100 -0
package/dist/sessions.js.map +1 -1
package/dist/ship.d.ts +2 -0
package/dist/ship.js +62 -0
package/dist/ship.js.map +1 -0
package/dist/skills/catalog.d.ts +13 -0
package/dist/skills/catalog.js +86 -0
package/dist/skills/catalog.js.map +1 -0
package/dist/tools/bash.js +81 -14
package/dist/tools/bash.js.map +1 -1
package/dist/tools/edit.js +21 -1
package/dist/tools/edit.js.map +1 -1
package/dist/tools/glob.js +4 -1
package/dist/tools/glob.js.map +1 -1
package/dist/tools/grep.d.ts +5 -0
package/dist/tools/grep.js +101 -2
package/dist/tools/grep.js.map +1 -1
package/dist/tools/index.d.ts +22 -0
package/dist/tools/index.js +177 -41
package/dist/tools/index.js.map +1 -1
package/dist/tools/ls.d.ts +3 -0
package/dist/tools/ls.js +23 -12
package/dist/tools/ls.js.map +1 -1
package/dist/tools/multiedit.d.ts +12 -0
package/dist/tools/multiedit.js +79 -0
package/dist/tools/multiedit.js.map +1 -0
package/dist/tools/stoa.d.ts +1 -1
package/dist/tools/stoa.js +7 -3
package/dist/tools/stoa.js.map +1 -1
package/dist/tools/task.d.ts +9 -0
package/dist/tools/task.js +166 -0
package/dist/tools/task.js.map +1 -0
package/dist/tools/todowrite.d.ts +12 -0
package/dist/tools/todowrite.js +38 -0
package/dist/tools/todowrite.js.map +1 -0
package/dist/tools/webfetch.d.ts +6 -0
package/dist/tools/webfetch.js +98 -0
package/dist/tools/webfetch.js.map +1 -0
package/dist/tools/websearch.d.ts +7 -0
package/dist/tools/websearch.js +83 -0
package/dist/tools/websearch.js.map +1 -0
package/dist/tools/write.js +17 -1
package/dist/tools/write.js.map +1 -1
package/dist/verifiers/calc_gate.d.ts +2 -0
package/dist/verifiers/calc_gate.js +112 -0
package/dist/verifiers/calc_gate.js.map +1 -0
package/dist/verifiers/citation_gate.d.ts +2 -0
package/dist/verifiers/citation_gate.js +130 -0
package/dist/verifiers/citation_gate.js.map +1 -0
package/dist/verifiers/confidence_marked.d.ts +2 -0
package/dist/verifiers/confidence_marked.js +49 -0
package/dist/verifiers/confidence_marked.js.map +1 -0
package/dist/verifiers/disclaimer_gate.d.ts +2 -0
package/dist/verifiers/disclaimer_gate.js +57 -0
package/dist/verifiers/disclaimer_gate.js.map +1 -0
package/dist/verifiers/evidence_gate.d.ts +2 -0
package/dist/verifiers/evidence_gate.js +108 -0
package/dist/verifiers/evidence_gate.js.map +1 -0
package/dist/verifiers/index.d.ts +5 -0
package/dist/verifiers/index.js +28 -7
package/dist/verifiers/index.js.map +1 -1
package/dist/verifiers/lint.js +4 -3
package/dist/verifiers/lint.js.map +1 -1
package/dist/verifiers/promoted_kernels.d.ts +8 -0
package/dist/verifiers/promoted_kernels.js +190 -0
package/dist/verifiers/promoted_kernels.js.map +1 -0
package/dist/verifiers/source_gate.d.ts +2 -0
package/dist/verifiers/source_gate.js +125 -0
package/dist/verifiers/source_gate.js.map +1 -0
package/dist/verifiers/test_smoke.js +30 -0
package/dist/verifiers/test_smoke.js.map +1 -1
package/dist/verifiers/types.d.ts +3 -0
package/package.json +4 -2
package/skills/README.md +123 -0
package/skills/ab-test.md +89 -0
package/skills/api-design.md +175 -0
package/skills/architecture-design.md +185 -0
package/skills/business-case.md +77 -0
package/skills/causal-inference.md +77 -0
package/skills/clinical-guideline.md +98 -0
package/skills/code-review.md +98 -0
package/skills/cold-outreach.md +268 -0
package/skills/competitive-teardown.md +223 -0
package/skills/component-spec.md +121 -0
package/skills/content-calendar.md +280 -0
package/skills/contract-review.md +155 -0
package/skills/data-analysis.md +187 -0
package/skills/debug.md +91 -0
package/skills/design-audit.md +121 -0
package/skills/differential-diagnosis.md +79 -0
package/skills/discovery-call.md +206 -0
package/skills/edit-pass.md +80 -0
package/skills/engineering-calc.md +101 -0
package/skills/estimate.md +70 -0
package/skills/experiment-design.md +105 -0
package/skills/fact-check.md +82 -0
package/skills/financial-model.md +104 -0
package/skills/grant-proposal.md +93 -0
package/skills/harmony-analysis.md +93 -0
package/skills/hypothesis-generation.md +99 -0
package/skills/incident-response.md +134 -0
package/skills/interview-loop.md +62 -0
package/skills/job-scorecard.md +92 -0
package/skills/kb-article.md +174 -0
package/skills/launch-plan.md +85 -0
package/skills/lease-review.md +93 -0
package/skills/lesson-plan.md +198 -0
package/skills/literature-review.md +69 -0
package/skills/market-entry.md +137 -0
package/skills/market-sizing.md +159 -0
package/skills/meta-analysis.md +140 -0
package/skills/migrate.md +117 -0
package/skills/optimize.md +88 -0
package/skills/options-strategy.md +166 -0
package/skills/peer-review.md +96 -0
package/skills/pentest-plan.md +193 -0
package/skills/pitch-review.md +132 -0
package/skills/plan.md +88 -0
package/skills/policy-brief.md +124 -0
package/skills/positioning.md +192 -0
package/skills/postmortem.md +168 -0
package/skills/prd.md +105 -0
package/skills/prioritize.md +162 -0
package/skills/proof.md +91 -0
package/skills/property-underwrite.md +159 -0
package/skills/recipe-develop.md +109 -0
package/skills/red-team.md +142 -0
package/skills/refactor.md +58 -0
package/skills/reflection-session.md +115 -0
package/skills/regulatory-compliance.md +136 -0
package/skills/reproduce.md +87 -0
package/skills/runbook.md +344 -0
package/skills/security-audit.md +154 -0
package/skills/seo-brief.md +201 -0
package/skills/sql-query.md +161 -0
package/skills/story-craft.md +163 -0
package/skills/tdd.md +59 -0
package/skills/term-sheet.md +298 -0
package/skills/theory-of-change.md +88 -0
package/skills/threat-model.md +104 -0
package/skills/ticket-triage.md +200 -0
package/skills/tolerance-analysis.md +149 -0
package/skills/training-program.md +151 -0
package/skills/translate.md +64 -0
package/skills/unit-economics.md +238 -0
package/skills/valuation.md +112 -0
package/skills/write-tests.md +77 -0

package/skills/reproduce.md ADDED Viewed

@@ -0,0 +1,87 @@
+---
+name: reproduce
+description: Reproduce a bug, paper result, or benchmark — pin the exact claim and environment, get a minimal case, reproduce baseline first, bisect to the trigger, report repro status honestly.
+allowed-tools: Read, Bash, WebFetch, Grep, Glob, Write
+---
+## Phase 0 — Pin the exact claim (5 min, no skipping)
+1. State the claim in one sentence with a number or observable behavior: "Model X achieves 84.2% on benchmark Y (Table 3, row 5)" or "Calling `foo(None)` raises `AttributeError` at line 47 of `bar.py`."
+2. Record the **source**: paper DOI/arXiv ID + section + table/figure, or issue URL + commit SHA + stack trace.
+3. Define the **success criterion** now, before you start: what exact value or behavior counts as reproduced? What tolerance is acceptable (±0.5%, ±1σ, exact crash, any crash)?
+4. Mark the scope: full repro (same number), directional repro (same trend, different number), or behavioral repro (same failure mode, different input).
+## Phase 1 — Capture the environment
+5. Record OS, kernel/platform, language runtime version, GPU driver + CUDA version if relevant.
+   ```bash
+   uname -a; python --version; pip freeze | sort; nvidia-smi 2>/dev/null || true
+   ```
+6. Pin the **exact** commit or release of every moving part: framework, dataset version, random seeds. If the source gives seeds, use them verbatim.
+7. Fetch the original method or source code before any local copy drifts:
+   ```
+   WebFetch the canonical URL (paper PDF, GitHub permalink, issue page) and save verbatim to scratch.
+   ```
+8. Record the fetch date — environment drift is the #1 repro killer; note anything that has changed since the original (package deprecations, dataset updates, API changes).
+## Phase 2 — Get a minimal reproducible case
+9. For a **bug**: reduce the failing input to the smallest input/fewest lines that still triggers it. Binary-search inputs, not code — cut the data in half each step.
+10. For a **paper/benchmark**: identify the single experiment or table row that is the sharpest signal. Reproduce the smallest possible sub-experiment first (one split, one seed, one model size).
+11. Write a self-contained script: one file, hardcoded seed, downloads its own data if small, prints the metric at the end.
+12. Confirm the minimal case compiles and runs clean in a fresh environment before proceeding.
+## Phase 3 — Reproduce the baseline FIRST
+13. Before changing anything, run the original code/config exactly as documented and record the output.
+    - If the original code is unavailable, state that explicitly and note what substitute you are using.
+    - If you must fill in missing details (unreported hyperparams, missing data splits), log each assumption.
+14. Compare the baseline output to the claimed result. If it matches: baseline confirmed — proceed. If it doesn't match: you are already in "failed to reproduce" territory — document and investigate before going further.
+15. Save baseline output as `repro_baseline.txt` (timestamp, command, stdout/stderr, metric).
+## Phase 4a — Bug: bisect to the precise trigger
+16. Use `git bisect` if the bug is a regression:
+    ```bash
+    git bisect start
+    git bisect bad HEAD
+    git bisect good <last-known-good-sha>
+    # run your minimal repro script at each step; git bisect good/bad
+    ```
+17. If not a regression, binary-search the input: halve the payload/data/config each iteration until the minimal trigger is isolated.
+18. Identify the exact line/condition: add a targeted print or debugger breakpoint at the suspected site — do not guess.
+19. State the root cause as a falsifiable hypothesis before proposing a fix.
+## Phase 4b — Paper/benchmark: follow stated method exactly, log every forced deviation
+20. Implement the method step by step as written in the paper (algorithm box, appendix, supplemental). Do not optimize or simplify yet.
+21. At each step where the paper is ambiguous or under-specified, log the assumption you made and why.
+22. Run with the paper's stated seed(s). If multiple seeds are reported, run all of them — a single seed can be an outlier.
+23. Log every deviation you were forced to make (unavailable data, deprecated API, different hardware precision). These are the caveats for the final report.
+## Phase 5 — Record every attempt
+24. For each run, log: timestamp, command or config, key hyperparams/inputs, output metric, wall-clock time, any warnings.
+    ```
+    repro_log.md — one row per attempt, never delete rows, mark corrections inline.
+    ```
+25. Never overwrite a previous result — append only. Auditability requires the full history.
+26. If a run fails (crash, NaN, timeout), record the exact error before retrying — do not just re-run silently.
+## Phase 6 — Classify and report honestly
+27. **Reproduced**: output matches the success criterion defined in Phase 0 (within stated tolerance). State: metric, deviation from claim, seed(s) used.
+28. **Partially reproduced (with caveats)**: correct direction/trend but value outside tolerance, OR reproduced on a subset of conditions. State each caveat explicitly — what matched, what didn't.
+29. **Failed to reproduce**: cannot obtain the claimed result under reasonable interpretation of the original. State: what you got, what was claimed, the most likely explanation (environment drift, missing detail, error in original).
+30. **Never claim "reproduced" when you got a different number.** Never soften "failed to reproduce" with vague language like "largely consistent" unless the deviation is within your pre-stated tolerance.
+31. If it doesn't reproduce, report what differs: the delta, the most probable cause, and what additional information from the authors would resolve it.
+## Hard rules
+- R1: State the success criterion BEFORE running anything. Post-hoc criterion adjustment is p-hacking.
+- R2: Record the environment before the first run. A missing version number invalidates the log.
+- R3: Baseline first, always. Changing code before confirming the original behavior conflates bugs.
+- R4: One variable at a time. Never change environment and code simultaneously.
+- R5: Negative results are results. "Failed to reproduce" is a valid, publishable, useful outcome.
+- R6: Quote the original claim verbatim alongside your result. Do not paraphrase in a way that shifts the bar.
+- R7: Distinguish what the paper/issue claimed from what you inferred. Use "claimed:" vs "inferred:".

package/skills/runbook.md ADDED Viewed

@@ -0,0 +1,344 @@
+---
+name: runbook
+description: On-call incident runbook — triage severity/scope in <2 min, detect patterns (cascade vs. isolated), isolate blast radius, execute fixed-sequence recovery, document timeline, and verify the fix held.
+allowed-tools: Read, Write, Grep, Bash
+---
+## HARD RULES (NEVER VIOLATE — read before any action)
+- **NEVER restart all RunPod workers at once.** Scale down to zero, wait 10 s for drain, then set `min_workers=0 max_workers=1`, confirm clean, then restore targets. Simultaneous cold restarts spike queue depth and starve the load balancer.
+- **NEVER wait for perfect information.** Decisions at 80% confidence in 1 minute beat 100% in 10. You can verify and adjust.
+- **NEVER apply a permanent fix while on-call.** Rollback or shed load first. The fix is a PR tomorrow.
+- **NEVER merge a "fix" for an ongoing incident without re-testing it against the failure mode.** Fixes applied under stress break.
+- **NEVER skip the 10-minute monitoring window.** Declare victory too early and the incident re-spikes while you are paged down.
+- **NEVER blame the person who deployed.** Find the system gap. Fix the system.
+- **NEVER call AWS endpoints.** This stack runs on RunPod / Vercel / Cloudflare R2 / Neon — not EC2, EKS, RDS, or S3.
+- **NEVER skip the postmortem.** The next incident of this class is weeks away. This is your only chance to break the cycle.
+---
+## STANCE RULE (non-negotiable)
+You are the on-call engineer at 3 AM for a production outage. Minutes matter. Your job is NOT to understand the entire system or find permanent fixes tonight — it is to **restore service to users as fast as possible**, contain the blast radius, and document the timeline so morning can do a postmortem. Do not reason; execute the checklist.
+Production services and their health signals:
+| Service | Check command | Healthy signal |
+|---------|--------------|----------------|
+| Vercel marketing / API functions (`tryvext.com`, `itstheron.com`) | `curl -si -m 8 https://tryvext.com/api/health` | HTTP 200 |
+| RunPod Serverless LLM endpoints (`$CYBER_URL`, `$THERON_LLM_URL`, etc.) | See Phase 1 step 1 | `{"status":"healthy"}` or job accepted |
+| Neon Postgres | `psql $DATABASE_URL -c "SELECT 1;" 2>&1` | `1 row` |
+| Cloudflare R2 (model artifacts) | `curl -si -m 10 "$R2_ENDPOINT/vext-models/?list-type=2" -H "Authorization: AWS4-HMAC-SHA256 ..."` or use rclone | HTTP 200 + XML |
+| Upstash Redis | `redis-cli -u $REDIS_URL ping` | `PONG` |
+Credentials for all of the above live in `.secrets/CREDS.md` (gitignored, owner-authorized). Read that file before asking for any credential.
+---
+## PHASE 1 — INCIDENT TRIAGE (<2 MINUTES)
+1. **Establish ground truth for the user-facing surface.** Do not trust dashboards — execute the command yourself.
+   For Vercel functions (the primary API layer):
+   ```bash
+   curl -si -m 8 https://tryvext.com/api/health
+   # also: https://itstheron.com/api/health if the OS consumer site is impacted
+   ```
+   For a RunPod Serverless endpoint (replace `$ENDPOINT_URL` with the URL from `~/.theron/serverless_endpoints.env`):
+   ```bash
+   source ~/.theron/serverless_endpoints.env
+   curl -s -m 15 -X POST "$CYBER_URL/health" \
+     -H "Authorization: Bearer $RUNPOD_API_KEY" \
+     -H "Content-Type: application/json" \
+     -d '{"input":{"messages":[{"role":"user","content":"ping"}]}}' | jq .
+   ```
+   For Neon Postgres:
+   ```bash
+   psql "$DATABASE_URL" -c "SELECT now(), count(*) FROM agent_sessions WHERE created_at > now() - interval '5 minutes';" 2>&1
+   ```
+   Record: exact response code, latency, error body, timestamp.
+2. **Scope the blast:** Is it all users or a subset? All Vercel regions or one? All LLM specialists or one endpoint?
+   ```bash
+   # Scan Vercel function logs for the error pattern (requires Vercel CLI)
+   vercel logs --since 5m 2>&1 | grep -i "error\|5[0-9][0-9]" | awk '{print $1}' | sort | uniq -c | sort -rn | head -20
+   # Or query Neon for recent error rows
+   psql "$DATABASE_URL" -c "SELECT status, count(*) FROM runs WHERE updated_at > now() - interval '5 minutes' GROUP BY status;"
+   ```
+   - Same error ramping uniformly over time → **cascade**.
+   - Single spike at a timestamp then plateau → **isolated event**.
+3. **Severity and page-out decision:**
+   - Primary user-facing API returning 5xx for >1 min → **SEV1** (all-hands page-out).
+   - One LLM specialist endpoint down; others healthy → **SEV2** (page on-call rotation; users degrade to fallback model).
+   - Background job / research run failing; chat unaffected → **SEV3** (ticket, morning work).
+   - Transient spike recovering on its own → **SEV3** (monitor 5 min; escalate if it persists).
+4. **Declare start time and severity** in PagerDuty / OpsGenie / shared incident doc. Include: affected service URL, severity, estimated user impact, blast scope.
+---
+## PHASE 2 — PATTERN DETECTION (1–2 MINUTES)
+5. **Classify the failure class** — your next action depends on this:
+   | Class | Signal | Go to |
+   |-------|--------|-------|
+   | **Cascade** | Error rate rising over 30 s+, latency climbing, not all-down | PHASE 3 |
+   | **Isolated** | Error spike at single timestamp, Vercel or RunPod logs show one bad request pattern | PHASE 4 |
+   | **Exhaustion** | Neon connection pool at limit, RunPod worker queue depth maxed, Vercel cold-start timeout | PHASE 5 |
+   | **Dependency** | Your Vercel function healthy; RunPod / Neon / R2 / Upstash unresponsive | PHASE 6 |
+   | **Configuration** | Errors started exactly when the last Vercel deploy or env-var change landed | PHASE 7 |
+6. **Notify your team leads asynchronously** (Slack, not calls unless SEV1): "Outage detected at [TIME], [SERVICE], [SCOPE]. Triaging now. Update in 5 min or I will call if blocked."
+---
+## PHASE 3 — LOAD SHEDDING (CASCADE FAILURE)
+7. **Never outrun the problem with more workers.** Adding RunPod `max_workers` when a cascade is in progress amplifies the failure by flooding the exhausted downstream (Neon, R2, or an external API).
+8. **Read the top error** at second granularity from Vercel logs or Neon:
+   ```bash
+   # Top failing Vercel API routes in last 5 minutes
+   vercel logs --since 5m 2>&1 | grep '"status":5' | jq -r '.path' | sort | uniq -c | sort -rn | head -5
+   # Or: most-active error in agent_actions (the task queue)
+   psql "$DATABASE_URL" -c "SELECT error_type, count(*) FROM agent_actions WHERE updated_at > now() - interval '5 minutes' AND status='failed' GROUP BY error_type ORDER BY count DESC LIMIT 5;"
+   ```
+9. **Shed the highest-volume failure mode first:**
+   - **One Vercel route failing:** Add a temporary `return Response.json({error:'maintenance'},{status:503})` at the top of that route file, deploy with `vercel --prod` to isolate it. Other routes keep serving.
+   - **One RunPod specialist endpoint looping (job queue filling):** Reduce `max_workers` to 0 via the RunPod API to drain the queue, then restore:
+     ```bash
+     curl -s -X PATCH "https://api.runpod.io/v2/$ENDPOINT_ID/config" \
+       -H "Authorization: Bearer $RUNPOD_API_KEY" \
+       -H "Content-Type: application/json" \
+       -d '{"maxWorkers": 0}' | jq .
+     ```
+   - **Background research run looping:** pause the job in Neon:
+     ```bash
+     psql "$DATABASE_URL" -c "UPDATE runs SET status='paused' WHERE status='running' AND updated_at < now() - interval '10 minutes';"
+     ```
+10. **Restart specialists in batches — never all at once.** Restart one endpoint, wait 30 s, observe error rate. If it improves, proceed to the next.
+11. **Record the action:** timestamp, what was shed, expected impact.
+---
+## PHASE 4 — ROOT-CAUSE HYPOTHESIS (ISOLATED FAILURE)
+12. **Form a 2-sentence hypothesis on "what changed":** Did a Vercel deploy land? Did an env var get rotated or deleted? Did the RunPod container image change? Did a Neon migration run?
+13. **Check recent Vercel deploys:**
+    ```bash
+    vercel ls --limit 5
+    # Note the deployment that went live closest to the incident start time
+    ```
+14. **Check recent migrations:**
+    ```bash
+    # List migration files sorted by timestamp
+    ls -lt marketing/api/_lib/migrations/*.sql | head -10
+    # The newest file is the most recent schema change
+    ```
+15. **Check recent config / env-var changes:**
+    ```bash
+    # If env vars are committed as a template, diff from the last good deploy
+    git log --oneline --since="1 hour ago" -- marketing/ packages/theron-cli/ packages/theron-agent-sdk/
+    ```
+16. **Correlate with the failure timestamp.** If a deploy or migration landed 2 min before the error spike, that is the hypothesis. Otherwise look for RunPod container image changes or external-dependency outages.
+17. **Check every upstream dependency health** directly:
+    ```bash
+    # Neon
+    psql "$DATABASE_URL" -c "SELECT version();" 2>&1
+    # RunPod API reachable?
+    curl -s "https://api.runpod.io/graphql?api_key=$RUNPOD_API_KEY" \
+      -H "Content-Type: application/json" \
+      -d '{"query":"{ myself { id } }"}' | jq .
+    # Cloudflare R2 reachable? (using rclone configured for R2)
+    rclone lsd r2:vext-models --max-depth 1 2>&1 | head -5
+    # Upstash Redis
+    redis-cli -u "$REDIS_URL" ping 2>&1
+    ```
+---
+## PHASE 5 — RESOURCE RECOVERY (EXHAUSTION)
+18. **Identify the exhausted resource:**
+    ```bash
+    # Neon connection pool
+    psql "$DATABASE_URL" -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
+    # If active+idle > ~80, the pool is at risk.
+    # RunPod endpoint queue depth (requests waiting for a worker)
+    curl -s "https://api.runpod.io/v2/$ENDPOINT_ID/metrics" \
+      -H "Authorization: Bearer $RUNPOD_API_KEY" | jq '{queue_depth: .requestsInQueue, workers: .workersRunning}'
+    ```
+18. **Immediate recovery — do NOT wait for new workers to cold-start:**
+    - **Neon connection pool exhausted:** Terminate idle connections (do NOT kill active queries):
+      ```bash
+      psql "$DATABASE_URL" -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state='idle' AND query_start < now() - interval '5 minutes' AND pid <> pg_backend_pid();"
+      ```
+    - **RunPod queue depth maxed, workers stuck:** Purge the stale queue and scale to 0 then back:
+      ```bash
+      curl -s -X POST "https://api.runpod.io/v2/$ENDPOINT_ID/purge-queue" \
+        -H "Authorization: Bearer $RUNPOD_API_KEY" | jq .
+      ```
+    - **Vercel function cold-start timeout (>10 s):** The function is trying to load a large model or hit a slow dependency; add `?_vercel_no_cache=1` to force a fresh invocation and observe if a cold start is the problem.
+19. **Scale up RunPod workers if queue is real (not a stuck job):**
+    ```bash
+    curl -s -X PATCH "https://api.runpod.io/v2/$ENDPOINT_ID/config" \
+      -H "Authorization: Bearer $RUNPOD_API_KEY" \
+      -H "Content-Type: application/json" \
+      -d '{"maxWorkers": 5}' | jq .
+    ```
+20. **Monitor:** watch queue depth and worker count for 60 s. If error rate does NOT drop after workers are up, this is not an exhaustion problem — re-triage at PHASE 2.
+---
+## PHASE 6 — DEPENDENCY FAILOVER (EXTERNAL SERVICE DOWN)
+21. **Confirm the dependency is actually down** — distinguish timeout from authentication failure from network partition:
+    ```bash
+    # RunPod API
+    curl -sv -m 10 "https://api.runpod.io/graphql?api_key=$RUNPOD_API_KEY" \
+      -H "Content-Type: application/json" -d '{"query":"{ myself { id } }"}' 2>&1 | tail -20
+    # Neon (separate from the app pool — use the direct URL)
+    psql "$DIRECT_DATABASE_URL" -c "SELECT 1;" 2>&1
+    # OpenRouter (fallback LLM substrate)
+    curl -s -m 5 "https://openrouter.ai/api/v1/models" \
+      -H "Authorization: Bearer $OPENROUTER_API_KEY" | jq '.data[0].id'
+    ```
+22. **If RunPod LLM endpoint is down:** The fallback substrate is OpenRouter (already wired in `marketing/api/_lib/agent_runtime.ts`). Confirm `OPENROUTER_API_KEY` is set in Vercel env and that the `useOpenRouterFallback` flag is enabled (or enable it):
+    ```bash
+    vercel env add THERON_USE_OPENROUTER_FALLBACK production
+    # enter: true
+    vercel --prod
+    ```
+23. **If Neon Postgres is down:** There is no hot standby in the current stack. Mitigation: set a feature flag to disable write-heavy flows (research runs, agent sessions) while keeping read-only chat alive if a read replica exists. Otherwise: communicate the outage and wait for Neon restoration (check `status.neon.tech`).
+24. **If Cloudflare R2 is down:** Model weights cannot be pulled on cold start. RunPod workers will fail to initialize. Mitigation: check `status.cloudflarestatus.com`. Workers that are already warm will continue serving until they are replaced — do NOT restart them during an R2 outage.
+25. **Do NOT hammer a failing dependency.** If it returns 503, add exponential backoff in the caller. Do NOT in a loop `retry every 1 s`.
+---
+## PHASE 7 — ROLLBACK (CONFIGURATION OR CODE)
+26. **If a recent Vercel deploy or env-var change is the root cause, roll back immediately.** A 5-minute Vercel rollback beats a 30-minute "simple fix" every time.
+    ```bash
+    # List recent Vercel deployments to identify the last-good deployment URL/ID
+    vercel ls --limit 10
+    # Promote the last-known-good deployment back to production
+    vercel promote <deployment-id> --scope <team-or-account>
+    ```
+27. **If the cause is a bad migration:** There is no auto-rollback for Neon schema changes. Steps:
+    - Write a compensating migration (add back the dropped column, revert the altered constraint).
+    - Run it directly against the Neon URL: `psql "$DATABASE_URL" -f rollback_migration.sql`.
+    - Record the file in `marketing/api/_lib/migrations/` with a timestamp prefix.
+28. **If the cause is a RunPod container image update:** Pin the endpoint back to the last-good image SHA:
+    ```bash
+    # Update the RunPod endpoint to use a specific container image tag
+    curl -s -X PATCH "https://api.runpod.io/v2/$ENDPOINT_ID/config" \
+      -H "Authorization: Bearer $RUNPOD_API_KEY" \
+      -H "Content-Type: application/json" \
+      -d '{"imageName": "ghcr.io/vext-labs-inc/theron-serverless:<last-good-tag>"}' | jq .
+    ```
+29. **Monitor post-rollback:** error rate, latency, Neon row counts. If the error stops, declare the incident resolved pending postmortem. If the error persists, the rollback target was not clean — go further back in git history or look for an independent contributing factor.
+30. **Record:** timestamp of rollback, which Vercel deployment ID / commit / image tag, expected behavior restored.
+---
+## PHASE 8 — MONITORING & HOLDING (SUSTAINED RECOVERY)
+31. **Stay on incident for 10 minutes post-recovery.** One green metric is not enough. Watch all of:
+    - Error rate staying low on Vercel logs (no re-spike).
+    - RunPod endpoint returning valid job completions (not just `accepted`).
+    - Neon connection count back below 60.
+    - `agent_sessions` and `runs` tables showing new successful rows.
+    - Customer reports stopping (check support queue, any Discord/Slack support channels).
+32. **If error rate rises again:** you have a cascade or intermittent issue. Go back to PHASE 2 and re-triage — treat this as a new incident.
+33. **Escalate if stuck >10 minutes into recovery:** Page a higher-level on-call (architecture lead). Phrase it: "I have [isolated the pattern to X], ruled out Y, but cannot [specific blocker]. Need expertise on Z."
+---
+## PHASE 9 — INCIDENT DECLARATION & POSTMORTEM
+34. **Close the incident** once all of the following are true:
+    - Error rate and latency back to baseline for >5 minutes.
+    - No new user reports in the last 2 minutes.
+    - All-clear communicated to stakeholders.
+35. **Document the timeline in the incident ticket:**
+    ```
+    [TIME] Incident started — detection method, error rate, affected service URL
+    [TIME] Severity declared: SEV[N], scope: [users/endpoints]
+    [TIME] Pattern classified as: [PHASE 2 class]
+    [TIME] Action 1: [what was done, PHASE ref] — expected: X, actual: Y
+    [TIME] Action 2: [what was done, PHASE ref] — expected: X, actual: Y
+    ...
+    [TIME] Incident resolved. Service healthy. Rollback/shed: [artifact link or commit]
+    ```
+36. **Schedule a 30-min postmortem for morning.** Invite: engineers who responded, the on-call lead, one person outside the team (for fresh eyes). Postmortem is for learning, NOT blame.
+37. **At the postmortem, ask and answer:**
+    - What was the actual root cause (not the trigger)?
+    - What would have caught this before users saw it?
+    - Is there a missing health check on a specific RunPod endpoint or Neon query?
+    - Assign ONE owner per action item with a due date this week, not "later."
+    - Example: "Add a `/health` route to the `$CYBER_URL` specialist that verifies model load, by [DATE]."
+---
+## VERIFICATION CHECKLIST (POST-RECOVERY)
+Run each check; all must pass before paging down.
+- [ ] `curl -si https://tryvext.com/api/health` returns HTTP 200
+- [ ] `curl -si https://itstheron.com/api/health` returns HTTP 200
+- [ ] RunPod LLM endpoint accepts a test job and completes it (see Phase 1 step 1 command)
+- [ ] `psql "$DATABASE_URL" -c "SELECT count(*) FROM pg_stat_activity WHERE state='idle';"` — idle connection count < 60
+- [ ] No new 5xx entries in `vercel logs --since 5m` for the affected route
+- [ ] `runs` and `agent_sessions` tables show new successful rows in the last 2 minutes
+- [ ] Rollback / shed-load action recorded in incident ticket with timestamp and artifact reference
+---
+## ESCALATION CONTACTS
+- **SEV1 (user-facing API down):** Page incident commander + RunPod on-call (if endpoint issue) + Neon support (`support.neon.tech`) if DB. Do it after 10 minutes if not resolved.
+- **SEV2 (one specialist endpoint / partial outage):** Notify service owner. They know the specialist's failure modes.
+- **SEV3 (background job / research run / non-critical path):** Create ticket, note timeline, morning handoff.
+---
+KEY PRINCIPLE: **In an incident, speed + containment beats perfection.** Shed the highest-volume failure, isolate the blast radius on the actual stack (Vercel / RunPod Serverless / Neon / R2), hold for 10 minutes, and escalate if stuck. The permanent fix is a PR tomorrow, not a 3 AM debugging marathon.

package/skills/security-audit.md ADDED Viewed

@@ -0,0 +1,154 @@
+---
+name: security-audit
+description: Static security audit of a codebase — map the attack surface, review against OWASP/CWE, scan deps + secrets; every finding gets severity, file:line, exploit scenario, and fix.
+allowed-tools: Read, Grep, Glob, Bash
+---
+## STANCE RULE (non-negotiable)
+You are a senior appsec engineer hired to find every exploitable flaw before an attacker does. Every "looks fine" is a failure of imagination. Assume the codebase is broken until you prove otherwise. No hand-waving — every finding points at a specific file and line. Distinguish **confirmed** (code path proven exploitable) from **suspected** (pattern present, exploitability requires runtime confirmation). Prioritize the exploitable over the theoretical.
+---
+## PHASE 1 — SCOPE + ATTACK SURFACE MAP
+1. List every external entry point: HTTP routes, CLI args, env vars, file uploads, message queues, webhooks, WebSockets, RPC endpoints, cron jobs. Use Grep for route registration patterns (`router\.`, `app\.get`, `@app.route`, `addRoute`, `express()`, `Hono`, `FastAPI`, `gin.`, `http.HandleFunc`).
+2. Draw the trust boundary: what callers are unauthenticated? What data arrives from the internet vs. internal services vs. the filesystem?
+3. List every authn/authz checkpoint: middleware names, guard decorators, JWT/session validation sites. Grep for `authenticate`, `authorize`, `requireAuth`, `isAdmin`, `middleware`, `guard`, `@Permission`.
+4. Map data flows: trace each entry point to its storage sink (DB write, file write, cache set) and output sink (HTTP response, email, external API call). Note every place user-controlled data crosses a trust boundary.
+5. Identify the technology stack: languages, frameworks, ORM/query builders, template engines, serialization libs. Each has a class of known vulns — note them now.
+---
+## PHASE 2 — INJECTION VULNERABILITIES (CWE-89/78/94/74)
+6. **SQL injection:** Grep for raw string interpolation into queries: `f"SELECT`, `"SELECT * FROM ${`, `query(` + `+`, `format(sql`, `.execute(f"`, `db.query(\``. For each hit, trace whether the value derives from user input. Parameterized queries are the only safe fix — `?` placeholders or ORM `.where({id})`.
+7. **Command injection (CWE-78):** Grep for `exec(`, `shell=True`, `subprocess.run`, `child_process.exec`, `os.system`, `eval(`, `Function(`. Trace every argument. Shell metacharacters in user input = RCE. Fix: `subprocess.run([...], shell=False)` with explicit arg arrays; never interpolate user input into shell strings.
+8. **Template injection (CWE-94):** Grep for `render(`, `template.render(`, `jinja2.Template(`, `Handlebars.compile(`, `nunjucks.renderString(` with user-supplied template strings. Server-side template injection is often RCE. Fix: never render user-supplied template strings; use static templates with variable substitution only.
+9. **Path traversal (CWE-22):** Grep for `fs.readFile`, `open(`, `sendFile`, `path.join` where any segment comes from request params. Test: does `../../etc/passwd` reach the filesystem? Fix: `path.resolve` + assert result starts with allowed base dir; use an allowlist of permitted paths.
+10. **Log injection (CWE-117):** Grep for `console.log(`, `logger.info(`, `print(` with unsanitized request fields. Newlines in log values can forge log entries. Fix: structured logging with field-level encoding.
+---
+## PHASE 3 — CROSS-SITE SCRIPTING (CWE-79)
+11. **Reflected XSS:** Grep for response writes that echo request input without encoding: `res.send(req.query.`, `innerHTML =`, `document.write(`, `dangerouslySetInnerHTML`. Every value from the request that lands in HTML output must be HTML-entity-encoded at the output site, not at the input site.
+12. **Stored XSS:** Grep for DB reads that flow into HTML templates without encoding. Check every template variable for auto-escaping — Django/Jinja2 auto-escape is on by default; React JSX is safe; string concatenation into HTML is not.
+13. **DOM XSS:** Grep for `location.hash`, `location.search`, `document.referrer` used with `eval(`, `innerHTML`, `src =`. These bypass server-side controls entirely.
+14. **Content-Security-Policy:** Check HTTP response headers or meta tags for CSP. Absence = no mitigation layer. `unsafe-inline` or `unsafe-eval` in CSP = CSP defeated. Note but do not treat as a confirmed finding without an injection point.
+---
+## PHASE 4 — BROKEN ACCESS CONTROL + IDOR (OWASP A01, CWE-639)
+15. For every data-fetch endpoint, check: does the query filter by the authenticated user's ID, or only by the resource ID from the request? Pattern: `SELECT * FROM orders WHERE id = $1` with `$1` from URL params and no `AND user_id = $current_user` = IDOR. Grep for ORM calls without ownership filters.
+16. Check admin/privileged routes: is the auth check at the route level, or only inside the handler body where it can be bypassed by an early return? Trace the full middleware chain for every privileged route.
+17. Check for horizontal privilege escalation: can user A modify user B's resource by substituting B's ID into a request? Test by tracing the `update` / `delete` path — does it re-validate ownership before writing?
+18. Check for mass assignment (CWE-915): does any endpoint accept a user-supplied object and pass it directly to `.create()`, `.update()`, or `.save()`? An attacker can set `is_admin=true` or `user_id=<victim>`. Fix: explicit field allowlists (`pick(body, ['name','email'])`), never spread untrusted objects into DB writes.
+---
+## PHASE 5 — AUTHENTICATION + SESSION (OWASP A07, CWE-287/384)
+19. **JWT / token validation:** Grep for JWT decode. Confirm: (a) signature is verified — `jwt.verify(token, secret)` not `jwt.decode(token)` without verification; (b) `alg` is explicitly pinned — `alg: none` attack; (c) expiry (`exp`) is checked. Any missing check = auth bypass.
+20. **Session fixation (CWE-384):** Is the session ID rotated after login? Grep for session creation in login handlers. A session created before authentication and reused after = session fixation.
+21. **Password storage (CWE-256/916):** Grep for password hashing. Only bcrypt/argon2/scrypt/pbkdf2 with adequate cost factor are acceptable. MD5/SHA1/SHA256 without salt = rainbow-table crackable. Plaintext storage = immediate critical.
+22. **Brute force / rate limiting:** Is there rate limiting on login, password reset, and OTP endpoints? Grep for rate-limit middleware on those routes. Absence = credential stuffing and OTP brute force are trivially possible.
+23. **Password reset (CWE-640):** Trace the reset token: is it cryptographically random (`crypto.randomBytes` / `secrets.token_urlsafe`)? Is it single-use? Does it expire? Does the response leak which emails exist in the system (user enumeration)?
+---
+## PHASE 6 — SSRF (CWE-918)
+24. Grep for outbound HTTP calls that incorporate user input: `fetch(url`, `axios.get(url`, `requests.get(url`, `http.get(url` where `url` derives from request body or params. An attacker who controls the URL can reach internal services, cloud metadata endpoints (`169.254.169.254`), and localhost.
+25. For each hit: is there an allowlist of permitted destination hosts/schemes? Does the code follow redirects to a different host? Fix: resolve the URL before connecting, assert hostname against an allowlist, block private RFC-1918 ranges and link-local addresses; disable redirect following or re-validate after redirect.
+---
+## PHASE 7 — INSECURE DESERIALIZATION (CWE-502)
+26. Grep for `pickle.loads`, `yaml.load(` (not `safe_load`), `marshal.loads`, `unserialize(`, `ObjectInputStream`, `JSON.parse` of signed-then-parsed objects. User-controlled deserialization of binary/YAML formats = RCE. Fix: use `yaml.safe_load`, avoid pickle for untrusted data, validate schema before deserializing.
+27. Check cookie values that contain serialized objects (base64-encoded blobs). If the server deserializes cookie content without HMAC verification, it is exploitable.
+---
+## PHASE 8 — SECRETS IN CODE + CONFIG (CWE-798/312)
+28. Run: `grep -rE "(api_key|secret|password|token|private_key|AWS_SECRET|DATABASE_URL)\s*=\s*['\"][^'\"]{8,}" . --include="*.ts" --include="*.js" --include="*.py" --include="*.go" --include="*.env" --include="*.yaml" --include="*.json" -l` — list files, then read hits.
+29. Check `.env.example`, `config/`, `docker-compose.yml`, `helm/values.yaml` for hardcoded non-placeholder secrets. Any non-example value = confirmed finding.
+30. Check git history for secrets that were committed and removed: `git log --all --oneline -S "password" -- *.env` (if git is available). Removed secrets are still in history and must be rotated + history-purged.
+31. Verify secrets are loaded from environment variables only at runtime. No secrets in source files, build artifacts, or client-side bundles. Grep the compiled/bundled output if present.
+---
+## PHASE 9 — CRYPTOGRAPHY MISUSE (CWE-327/330/338)
+32. Grep for weak algorithms: `MD5`, `SHA1`, `DES`, `RC4`, `ECB` mode. MD5/SHA1 are broken for collision resistance; DES/RC4 are broken for confidentiality. Note use for checksums (acceptable) vs. security (not acceptable).
+33. Grep for `Math.random()`, `random.random()`, `rand()` used for security-sensitive values (tokens, nonces, salts, session IDs). These are not cryptographically secure. Fix: `crypto.getRandomValues` / `crypto.randomBytes` / `secrets` module.
+34. Check IV/nonce reuse in AES-GCM or AES-CTR: a hardcoded or incrementing nonce with the same key = keystream reuse = plaintext recovery. Fix: random 96-bit nonce per encryption operation.
+35. Check TLS: are there `verify=False`, `InsecureSkipVerify`, `rejectUnauthorized: false` flags? These disable certificate validation = trivial MITM.
+---
+## PHASE 10 — DEPENDENCY + SUPPLY-CHAIN RISK (OWASP A06)
+36. Run: `npm audit --json 2>/dev/null | jq '.vulnerabilities | to_entries[] | select(.value.severity=="critical" or .value.severity=="high") | {pkg:.key, sev:.value.severity, via:.value.via[0]}'` — or `pip-audit`, `cargo audit`, `go list -m -json all | nancy sleuth`. List critical/high CVEs with the package name, version, CVE ID, and affected path.
+37. Check for lockfile presence: `package-lock.json`, `yarn.lock`, `poetry.lock`, `Cargo.lock`, `go.sum`. Missing lockfile = non-reproducible builds = supply-chain risk.
+38. Check for dependency confusion candidates: internal package names in `package.json` / `requirements.txt` that could be registered on the public registry. Grep for scoped packages with `@company/` prefix and verify they are published on the private registry.
+39. Check for typosquats: packages with names one character off from popular packages. Grep for unusual spellings of lodash, express, react, requests, boto3, etc.
+40. Check `postinstall` scripts in `package.json` of direct dependencies: `cat node_modules/*/package.json | jq 'select(.scripts.postinstall) | {name:.name, postinstall:.scripts.postinstall}'`. These run arbitrary code at install time.
+---
+## PHASE 11 — INPUT VALIDATION + OUTPUT ENCODING (CWE-20)
+41. At every entry point, check: is input validated for type, length, format, and range **before** it is used? Validation at the output site is too late — by then it may already have been stored, logged, or processed.
+42. Check numeric inputs for overflow: does the application accept arbitrarily large integers that could overflow when cast to int32 or used in arithmetic? Fix: explicit range checks.
+43. Check file upload handlers: are MIME type and extension validated server-side (not just client-side)? Is the upload stored outside the web root? Is the filename sanitized before use? Can a `.php` / `.py` / `.js` file be uploaded and executed?
+44. Check redirect targets: `res.redirect(req.query.next)` without validating `next` is a same-origin path = open redirect (CWE-601). Fix: assert redirect target starts with `/` and does not contain `//` or a protocol.
+---
+## PHASE 12 — LEAST PRIVILEGE + CONFIGURATION (CWE-250/732)
+45. Check service accounts and DB connections: are they scoped to the minimum permissions needed? A read-only API should not connect with a DB user that has `DROP TABLE` rights.
+46. Check file permissions on sensitive files: `find . -name "*.pem" -o -name "*.key" -o -name ".env" | xargs ls -la`. World-readable private keys = immediate finding.
+47. Check for debug mode in production indicators: `DEBUG=True`, `app.debug = True`, `NODE_ENV=development` in prod config. Debug mode often exposes stack traces, internal state, and disables security features.
+48. Check CORS configuration: `Access-Control-Allow-Origin: *` on endpoints that use cookies or bearer tokens = credentials leaked to any origin. Check for `credentials: true` combined with wildcard origin (browsers block this — but misconfigurations that use request origin reflection are exploitable).
+---
+## PHASE 13 — FINDINGS TABLE
+For every confirmed or suspected finding, produce one row:
+| ID | Phase | CWE | Severity | Likelihood | Priority | Finding | Location | Exploit Scenario | Fix |
+|----|-------|-----|----------|------------|----------|---------|----------|-----------------|-----|
+| F1 | 2 | CWE-89 | Critical | Probable | P1 | SQL injection via unsanitized `user_id` | `api/users.ts:47` | Attacker sends `id=1 OR 1=1--` to dump all rows | Parameterized query: `db.query('SELECT ... WHERE id=$1',[id])` |
+| F2 | 4 | CWE-639 | High | Certain | P1 | IDOR on `/api/orders/:id` — no ownership check | `routes/orders.ts:83` | Authenticated user substitutes another user's order ID, reads full order | Add `AND user_id = $currentUser.id` to query |
+Severity scale — Critical: direct compromise (RCE/auth bypass/mass data exposure); High: significant data/auth impact; Medium: limited impact or requires chained conditions; Low: defense-in-depth gap, minimal direct impact. Priority = P0 (Critical×Certain/Probable), P1 (Critical×Possible or High×Certain/Probable), P2 (High×Possible or Medium×Certain), P3 (everything else).
+---
+## PHASE 14 — VERDICT + TOP RISKS
+After the table:
+- **SHIP-BLOCKED** — any P0 finding; or two or more unmitigated P1 findings.
+- **CONDITIONAL** — P1 findings present, each has a specific committed fix or scheduled remediation with an owner and date.
+- **CLEAR** — no P0/P1 findings; P2/P3 documented with owner and timeline.
+State the **top 3 risks** in priority order regardless of verdict. The engineering owner must acknowledge each before merge.
+---
+## HARD RULES (never violate)
+- Never rate a finding lower because the fix is inconvenient. Rate on likelihood × impact only.
+- Never skip Phase 10 (deps) because the application "doesn't use many packages." One transitive CVE can be the whole finding.
+- Never accept "this is internal only" as a reason to skip an injection check. Insider threat and SSRF make internal services reachable.
+- Every finding must name an exact file and line, or be marked **suspected** with a grep command the reader can run to confirm.
+- Never end with "no findings" unless you ran every grep in Phases 2–12 and explicitly recorded what each returned. Show your work.
+- If you cannot read a file (binary, minified, compiled), say so and raise likelihood estimates for that surface by one tier.
+- "Consider adding validation" is not a fix. Every fix must be a concrete code change: the pattern to replace and the replacement.