npm - @vextlabs/theron-cli - Versions diffs - 0.2.1 → 0.4.0 - Mend

@vextlabs/theron-cli 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (191) hide show

package/dist/api.d.ts +8 -0
package/dist/api.js +3 -0
package/dist/api.js.map +1 -1
package/dist/auth.js +51 -1
package/dist/auth.js.map +1 -1
package/dist/banner.js +3 -2
package/dist/banner.js.map +1 -1
package/dist/checkpoints.d.ts +32 -0
package/dist/checkpoints.js +61 -0
package/dist/checkpoints.js.map +1 -0
package/dist/index.js +61 -5
package/dist/index.js.map +1 -1
package/dist/input.d.ts +61 -0
package/dist/input.js +574 -0
package/dist/input.js.map +1 -0
package/dist/profiles/index.js +5 -0
package/dist/profiles/index.js.map +1 -1
package/dist/profiles/methodologies/build_domains.d.ts +6 -0
package/dist/profiles/methodologies/build_domains.js +170 -0
package/dist/profiles/methodologies/build_domains.js.map +1 -0
package/dist/profiles/methodologies/operate_domains.d.ts +8 -0
package/dist/profiles/methodologies/operate_domains.js +1239 -0
package/dist/profiles/methodologies/operate_domains.js.map +1 -0
package/dist/profiles/methodologies/regulated_domains.d.ts +6 -0
package/dist/profiles/methodologies/regulated_domains.js +153 -0
package/dist/profiles/methodologies/regulated_domains.js.map +1 -0
package/dist/profiles/methodologies/research_domains.d.ts +8 -0
package/dist/profiles/methodologies/research_domains.js +179 -0
package/dist/profiles/methodologies/research_domains.js.map +1 -0
package/dist/profiles/methodologies/strategy_domains.d.ts +15 -0
package/dist/profiles/methodologies/strategy_domains.js +193 -0
package/dist/profiles/methodologies/strategy_domains.js.map +1 -0
package/dist/profiles/seeds.js +241 -95
package/dist/profiles/seeds.js.map +1 -1
package/dist/receipt.d.ts +17 -0
package/dist/receipt.js +46 -0
package/dist/receipt.js.map +1 -0
package/dist/render.d.ts +4 -1
package/dist/render.js +95 -28
package/dist/render.js.map +1 -1
package/dist/repl.d.ts +8 -1
package/dist/repl.js +420 -62
package/dist/repl.js.map +1 -1
package/dist/sessions.d.ts +14 -0
package/dist/sessions.js +100 -0
package/dist/sessions.js.map +1 -1
package/dist/ship.d.ts +2 -0
package/dist/ship.js +62 -0
package/dist/ship.js.map +1 -0
package/dist/skills/catalog.d.ts +13 -0
package/dist/skills/catalog.js +86 -0
package/dist/skills/catalog.js.map +1 -0
package/dist/tools/bash.js +81 -14
package/dist/tools/bash.js.map +1 -1
package/dist/tools/edit.js +21 -1
package/dist/tools/edit.js.map +1 -1
package/dist/tools/glob.js +4 -1
package/dist/tools/glob.js.map +1 -1
package/dist/tools/grep.d.ts +5 -0
package/dist/tools/grep.js +101 -2
package/dist/tools/grep.js.map +1 -1
package/dist/tools/index.d.ts +22 -0
package/dist/tools/index.js +177 -41
package/dist/tools/index.js.map +1 -1
package/dist/tools/ls.d.ts +3 -0
package/dist/tools/ls.js +23 -12
package/dist/tools/ls.js.map +1 -1
package/dist/tools/multiedit.d.ts +12 -0
package/dist/tools/multiedit.js +79 -0
package/dist/tools/multiedit.js.map +1 -0
package/dist/tools/stoa.d.ts +1 -1
package/dist/tools/stoa.js +7 -3
package/dist/tools/stoa.js.map +1 -1
package/dist/tools/task.d.ts +9 -0
package/dist/tools/task.js +166 -0
package/dist/tools/task.js.map +1 -0
package/dist/tools/todowrite.d.ts +12 -0
package/dist/tools/todowrite.js +38 -0
package/dist/tools/todowrite.js.map +1 -0
package/dist/tools/webfetch.d.ts +6 -0
package/dist/tools/webfetch.js +98 -0
package/dist/tools/webfetch.js.map +1 -0
package/dist/tools/websearch.d.ts +7 -0
package/dist/tools/websearch.js +83 -0
package/dist/tools/websearch.js.map +1 -0
package/dist/tools/write.js +17 -1
package/dist/tools/write.js.map +1 -1
package/dist/verifiers/calc_gate.d.ts +2 -0
package/dist/verifiers/calc_gate.js +112 -0
package/dist/verifiers/calc_gate.js.map +1 -0
package/dist/verifiers/citation_gate.d.ts +2 -0
package/dist/verifiers/citation_gate.js +130 -0
package/dist/verifiers/citation_gate.js.map +1 -0
package/dist/verifiers/confidence_marked.d.ts +2 -0
package/dist/verifiers/confidence_marked.js +49 -0
package/dist/verifiers/confidence_marked.js.map +1 -0
package/dist/verifiers/disclaimer_gate.d.ts +2 -0
package/dist/verifiers/disclaimer_gate.js +57 -0
package/dist/verifiers/disclaimer_gate.js.map +1 -0
package/dist/verifiers/evidence_gate.d.ts +2 -0
package/dist/verifiers/evidence_gate.js +108 -0
package/dist/verifiers/evidence_gate.js.map +1 -0
package/dist/verifiers/index.d.ts +5 -0
package/dist/verifiers/index.js +28 -7
package/dist/verifiers/index.js.map +1 -1
package/dist/verifiers/lint.js +4 -3
package/dist/verifiers/lint.js.map +1 -1
package/dist/verifiers/promoted_kernels.d.ts +8 -0
package/dist/verifiers/promoted_kernels.js +190 -0
package/dist/verifiers/promoted_kernels.js.map +1 -0
package/dist/verifiers/source_gate.d.ts +2 -0
package/dist/verifiers/source_gate.js +125 -0
package/dist/verifiers/source_gate.js.map +1 -0
package/dist/verifiers/test_smoke.js +30 -0
package/dist/verifiers/test_smoke.js.map +1 -1
package/dist/verifiers/types.d.ts +3 -0
package/package.json +4 -2
package/skills/README.md +123 -0
package/skills/ab-test.md +89 -0
package/skills/api-design.md +175 -0
package/skills/architecture-design.md +185 -0
package/skills/business-case.md +77 -0
package/skills/causal-inference.md +77 -0
package/skills/clinical-guideline.md +98 -0
package/skills/code-review.md +98 -0
package/skills/cold-outreach.md +268 -0
package/skills/competitive-teardown.md +223 -0
package/skills/component-spec.md +121 -0
package/skills/content-calendar.md +280 -0
package/skills/contract-review.md +155 -0
package/skills/data-analysis.md +187 -0
package/skills/debug.md +91 -0
package/skills/design-audit.md +121 -0
package/skills/differential-diagnosis.md +79 -0
package/skills/discovery-call.md +206 -0
package/skills/edit-pass.md +80 -0
package/skills/engineering-calc.md +101 -0
package/skills/estimate.md +70 -0
package/skills/experiment-design.md +105 -0
package/skills/fact-check.md +82 -0
package/skills/financial-model.md +104 -0
package/skills/grant-proposal.md +93 -0
package/skills/harmony-analysis.md +93 -0
package/skills/hypothesis-generation.md +99 -0
package/skills/incident-response.md +134 -0
package/skills/interview-loop.md +62 -0
package/skills/job-scorecard.md +92 -0
package/skills/kb-article.md +174 -0
package/skills/launch-plan.md +85 -0
package/skills/lease-review.md +93 -0
package/skills/lesson-plan.md +198 -0
package/skills/literature-review.md +69 -0
package/skills/market-entry.md +137 -0
package/skills/market-sizing.md +159 -0
package/skills/meta-analysis.md +140 -0
package/skills/migrate.md +117 -0
package/skills/optimize.md +88 -0
package/skills/options-strategy.md +166 -0
package/skills/peer-review.md +96 -0
package/skills/pentest-plan.md +193 -0
package/skills/pitch-review.md +132 -0
package/skills/plan.md +88 -0
package/skills/policy-brief.md +124 -0
package/skills/positioning.md +192 -0
package/skills/postmortem.md +168 -0
package/skills/prd.md +105 -0
package/skills/prioritize.md +162 -0
package/skills/proof.md +91 -0
package/skills/property-underwrite.md +159 -0
package/skills/recipe-develop.md +109 -0
package/skills/red-team.md +142 -0
package/skills/refactor.md +58 -0
package/skills/reflection-session.md +115 -0
package/skills/regulatory-compliance.md +136 -0
package/skills/reproduce.md +87 -0
package/skills/runbook.md +344 -0
package/skills/security-audit.md +154 -0
package/skills/seo-brief.md +201 -0
package/skills/sql-query.md +161 -0
package/skills/story-craft.md +163 -0
package/skills/tdd.md +59 -0
package/skills/term-sheet.md +298 -0
package/skills/theory-of-change.md +88 -0
package/skills/threat-model.md +104 -0
package/skills/ticket-triage.md +200 -0
package/skills/tolerance-analysis.md +149 -0
package/skills/training-program.md +151 -0
package/skills/translate.md +64 -0
package/skills/unit-economics.md +238 -0
package/skills/valuation.md +112 -0
package/skills/write-tests.md +77 -0

package/skills/prd.md ADDED Viewed

@@ -0,0 +1,105 @@
+---
+name: prd
+description: Write a complete Product Requirements Document — extract the real problem, define Theron users + evidence, set measurable goals tied to agentic/receipt/specialist value, scope must/should/could requirements, map key flows and edge cases through the STOA/CIP surface, and lock an explicit out-of-scope.
+allowed-tools: Read, Write
+---
+═══ HARD RULES ═══
+- Every requirement MUST trace to a named user problem. If you cannot write that trace, cut the requirement.
+- Never state a metric without specifying baseline, target, and measurement method.
+- Success metrics are OUTCOMES (receipt-verify rate, agentic task completion, specialist accuracy on domain evals) — never outputs (features shipped, endpoints added, LoRAs trained).
+- Non-goals are binding constraints, not polite suggestions. Write them as "We will NOT …" to force clarity.
+- "Should" and "could" items are explicitly deferred — do not allow them to creep into the build scope without a deliberate decision.
+- Distinguish DISCOVERY evidence (what users said) from BEHAVIORAL evidence (what users did). Behavioral evidence outweighs stated preferences.
+- Never fabricate user quotes, analytics figures, eval scores, or benchmark numbers. Mark any unvalidated assumption as [ASSUMPTION — validate before build].
+- A PRD is not a spec. It answers WHY and WHAT, not HOW. Resist writing implementation details; flag them as [IMPL NOTE — move to tech spec].
+- Never conflate shipped capability with roadmap capability. CIP weight-rewriting and on-support consolidation are different things — label each correctly. Never describe a capability as live if it routes through OpenRouter/DeepSeek rather than own-weights.
+- Every edge case must name a resolution policy, not just acknowledge it exists.
+- The document is done when a new engineer who has never spoken to the team can read it and know exactly what to build, why, and how to know they succeeded.
+═══ THERON PRODUCT SURFACE — READ BEFORE WRITING ═══
+Theron has four distinct capability layers that shape every PRD written for this product. Identify which layers your feature touches before drafting requirements:
+- **Agentic loop** — visible loop activity, tool execution, trajectory monitoring, HITL checkpoints. User-facing: the loop is transparent, not a black box.
+- **STOA receipt / verifier** — every agent action produces a cryptographically signed receipt verifiable by the user without trusting Vext Labs. WebCrypto in-browser, ES256. The receipt IS the differentiator — do not ship a feature that bypasses it.
+- **Specialist council** — 31 specialists composed from 15 knowledge-cluster LoRAs on Theron-Base. A feature scoped to "Theron-Cyber" is different from one scoped to "all specialists." Know which specialists are affected and whether their LoRA is served or still gated.
+- **CIP / memory** — the workspace that forms around the user (persona, imported identity, memory, tools). Be explicit about whether a feature requires CIP (roadmap) vs. the session/persona layer (shipped).
+Theron's three primary user personas:
+- **Developer/CLI user** — reaches Theron via theron-cli or the API; cares about tool contracts, receipt auditability, and scripting agent tasks; high technical fluency; does not want UI friction.
+- **Pro consumer** — reaches Theron via itstheron.com; wants a personalized specialist that remembers context and stays on the record; moderate technical fluency; values the receipt as trust signal, not raw crypto.
+- **Operator/team lead** — deploys Theron for a team via /console; cares about audit trails, billing, specialist routing, and data residency; high accountability sensitivity.
+═══ PHASE A — ANCHOR ON THE REAL PROBLEM ═══
+1. Write a one-sentence problem statement in the form: "[User type] cannot [do X] because [root cause], which causes [measurable pain]." If you cannot fill all four slots, you do not understand the problem yet — stop and investigate.
+2. Distinguish symptom from root cause. Ask "why" five times or until the answer requires a business/product decision rather than another why. Document the chain.
+3. State the user explicitly: which Theron persona (Developer/CLI, Pro consumer, Operator), their context, their current workaround, and how frequently they hit the problem. Never write "users" as a monolith.
+4. List the evidence for the problem: user interviews (n=?), support tickets (volume, recurrence), session replay / CLI telemetry, churn exit surveys, eval regressions. Label each source as DISCOVERY or BEHAVIORAL. CLI telemetry and eval data are BEHAVIORAL and outweigh stated preferences.
+5. State what you are NOT solving: adjacent pains that are real but out of scope for this document. This is the first slice of the non-goals list.
+6. Confirm the problem lives in a shipped layer (agentic loop, receipt, CLI surface, persona) rather than a CIP/LoRA/own-weights capability that is not yet live. If it touches a non-live layer, label the requirement as gated and state the gate condition.
+═══ PHASE B — DEFINE GOALS AND NON-GOALS ═══
+7. Write 2–4 outcome goals. Format: "Achieve [metric] from [baseline] to [target] by [date/milestone], measured by [method]." Theron-relevant metric examples: receipt verification rate (% of sessions where a user opens the verifier), specialist routing accuracy on domain evals, CLI task completion rate without tool errors, day-7 Pro user retention, agentic loop step success rate before HITL interruption. Do not use generic SaaS retention numbers as placeholders — state the actual baseline or mark it [ASSUMPTION].
+8. Identify the PRIMARY goal (the one that, if achieved, validates the entire build). All other goals are secondary.
+9. Write explicit non-goals as a numbered list of "We will NOT …" statements. Include: functionality excluded, user segments excluded, specialists excluded, platforms excluded, and quality bars deliberately not targeted in this iteration.
+10. For each non-goal, write one sentence explaining WHY it is deferred. Un-annotated non-goals drift back in.
+11. Record open questions that must be answered before build starts. Assign each an owner and a resolution date.
+═══ PHASE C — SUCCESS METRICS AND INSTRUMENTATION PLAN ═══
+12. For each goal in Phase B, define the PRIMARY metric and one GUARDRAIL metric that must not regress. Theron-specific guardrail examples: receipt signing must not add more than 200 ms to agent action latency; specialist routing must not drop below baseline domain eval score; CLI output must remain parseable (exit codes stable).
+13. State measurement method precisely: event name, CLI telemetry key or API log field, the SQL or analytics query against the relevant table (`agent_actions`, `tool_executions`, `artifact_access_log`, `client_memory`), the dashboard or report, and sampling strategy.
+14. Write the BASELINE: current value of each metric, measured when, on which cohort. If baseline is unknown, create a pre-build measurement task and block launch on it.
+15. Define the launch readiness threshold: the minimum metric value required to proceed from beta to GA. Do not ship without a threshold.
+16. State counter-metrics you will watch. For agentic features: watch error escalation rate and HITL interruption frequency as leading indicators that the happy path is not working. For receipt features: watch receipt open rate — if it is near zero, the feature is invisible to users and the trust signal is not landing.
+═══ PHASE D — REQUIREMENTS (MUST / SHOULD / COULD) ═══
+17. Write requirements as user-facing behaviors, not system behaviors. Format: "A [Theron persona] can [action] so that [outcome]." Bad: "The system shall sign tool outputs." Good: "A Developer/CLI user can run `theron verify <receipt-id>` against any agent action from the current session and receive a pass/fail verdict in under 2 seconds, so that they can confirm the action record was not altered before sharing it with a third party."
+18. Tag each requirement: MUST (blocking — build is wrong without it), SHOULD (strong preference — defer only with explicit tradeoff), COULD (nice-to-have — cut if time-constrained without re-negotiation).
+19. For every MUST requirement, write the trace: which user problem from Phase A does this solve, and which metric from Phase C does it move?
+20. Write MUST requirements first, in priority order within that tier. Stop when you have exhausted the problem statement; do not add requirements to fill space.
+21. For SHOULD and COULD items, write the condition under which they get promoted to MUST: "Promote to MUST if post-beta data shows [condition]."
+22. Include explicit MUST requirements for: error states (what the CLI or UI shows when an agent tool call fails), empty states (what the user sees before any agent session exists), the undo/recovery path for any destructive action, and receipt integrity on partial or interrupted sessions.
+23. If a requirement depends on a non-live capability (own-weights serving, CIP memory, a specific LoRA not yet deployed), mark it [GATED: <condition>] and do not include it in the MUST tier without a clear gate-removal plan.
+═══ PHASE E — KEY FLOWS AND EDGE CASES ═══
+24. Map the HAPPY PATH as a numbered step sequence from the user's perspective. Each step: user action → system response → user's next decision point. Keep it to ≤ 12 steps; if longer, split into sub-flows. For CLI flows, show the exact command and expected output shape (not implementation, but the contract visible to the user).
+25. For each branch point in the happy path, enumerate the edge cases:
+    - What happens if the input is malformed, empty, or exceeds the context window of the target specialist?
+    - What happens if the RunPod Serverless endpoint is cold-starting or times out?
+    - What happens if the STOA signing key is unavailable or the receipt cannot be written?
+    - What happens if the user interrupts an agentic loop mid-execution?
+    - What happens if the user replays the same command (idempotency of receipts and tool outputs)?
+26. For each edge case, write the resolution policy: fallback behavior, exact error message or paraphrase, retry behavior, and whether the user loses the partial session.
+27. Identify the CRITICAL PATH: the single sequence of steps where failure causes total loss of user value. For most Theron features this includes: agent action execution → tool output → STOA signing → receipt stored. Mark it explicitly. Every MUST requirement that touches the critical path gets a fallback documented.
+28. Map the OFFBOARDING flow: what happens when a Pro user cancels or an Operator removes a team member? Which `client_memory` entries are retained, exported, or deleted? State data retention policy explicitly — do not assume it is handled elsewhere.
+═══ PHASE F — RISKS AND DEPENDENCIES ═══
+29. List external dependencies: RunPod Serverless endpoint availability for the relevant specialist(s), Cloudflare R2 artifact write path, Neon DB schema state, Stripe billing state if the feature is behind a paywall, any third-party MCP server. For each: owner, current status, and impact if it slips 2 weeks.
+30. List technical risks: cold-start latency on specialist endpoints (min_workers=0 by default except Cyber=1), R2 egress-free but latency under load, Neon connection pooling under concurrent sessions, WebCrypto receipt verification in non-HTTPS environments. Rate each: LIKELY / POSSIBLE / UNLIKELY × HIGH / MEDIUM / LOW IMPACT. Address LIKELY × HIGH in requirements or explicitly defer.
+31. List product risks as falsifiable hypotheses: "We believe [user type] will [do X] because [evidence]. We will know this is wrong if [counter-signal] within [timeframe]." At least one hypothesis must address whether users actually open the receipt verifier — if they do not, the trust signal the feature is built on is not landing.
+32. List regulatory / privacy / security risks that require review before launch. For features touching `client_memory` or `artifact_access_log`, state what PII enters those tables and who reviews the data retention policy.
+═══ PHASE G — EXPLICIT OUT-OF-SCOPE ═══
+33. Write a flat list of features, integrations, user segments, specialists, platforms, and quality bars that are NOT in scope for this release. Be specific enough that a reader cannot argue "but you didn't say X was out of scope." For Theron features, explicitly call out which specialists are excluded, whether the CLI surface is in scope vs. the web UI, and whether the receipt layer is in scope vs. just the agent action.
+34. For each out-of-scope item likely to be requested during build, write one sentence explaining the deferral rationale.
+35. State the CONDITIONS under which out-of-scope items would be re-evaluated: "Revisit multi-specialist routing if beta shows users consistently switching specialists mid-session."
+═══ PHASE H — FINAL COHERENCE CHECK ═══
+36. Read every MUST requirement. Confirm it traces to a problem in Phase A and a metric in Phase C. Delete any that do not.
+37. Read the non-goals list. Confirm each has an annotation explaining WHY. Add annotations for any that lack them.
+38. Read the success metrics. Confirm each has a baseline, a target, a measurement method, and a launch readiness threshold. Add any missing slots.
+39. Read the out-of-scope section. Confirm it covers the top 5 things a stakeholder will ask "but what about…?" for. Add any missing items.
+40. Confirm no requirement claims a capability is live that is not: verify own-weights serving status, CIP memory status, and the specific LoRA deployment status for any specialist named in the document. Mislabeled capability status is the #1 Theron PRD failure mode.
+41. Ask: can a new engineer read this document alone and know (a) what to build, (b) why, (c) what done looks like, (d) what NOT to build, and (e) which Theron capability layer each requirement touches? If any answer is no, fix before publishing.
+KEY PRINCIPLE: A PRD is a decision record — every line either narrows the solution space or explains why the space was left open. In Theron's context, the receipt and the agentic loop are non-negotiable anchors: any feature that cannot articulate how it fits the "AI is the hero, the receipt is the proof" narrative should be scrutinized before it enters the MUST tier.

package/skills/prioritize.md ADDED Viewed

@@ -0,0 +1,162 @@
+---
+name: prioritize
+description: Score and sequence a backlog of features, bugs, research, or capability-injection work using RICE/WSJF — show every formula step, expose hard dependencies, name deferrals and why, and output a waterlined cycle plan.
+allowed-tools: Read, Write
+---
+# PRIORITIZE — Backlog Sequencing Playbook
+**When to invoke:** You have N candidate work items (features, bugs, tech debt, research experiments, capability-injection steps, infra changes) and must decide what enters the next cycle, what is deferred, and — critically — why each deferred item is blocked rather than buried.
+---
+## ═══ HARD RULES ═══
+1. **Score ALL items, not just finalists.** A missing denominator makes any ranking worthless. If you have 12 items, produce 12 scores, including the ones you expect to defer.
+2. **Show every multiplication step.** Write `Reach=4, Impact=2, Confidence=0.8, Effort=1.5 weeks → RICE = (4×2×0.8)/1.5 = 4.27`. The number without the path is not auditable.
+3. **Dependency is a hard constraint, not a tiebreaker.** If item X is a prerequisite for items Y and Z, X goes first regardless of its score. Flag it explicitly: "Upstream of: Y, Z — sequenced ahead of score." Never bury this in prose.
+4. **Name every deferral and why.** For each item below the waterline write exactly one sentence: "Deferred: [score too low / blocked on X / effort exceeds capacity / confidence gap — run experiment E first]." Silence is not an explanation.
+5. **Distinguish "can't" from "shouldn't."** A hard blocker (missing external approval, prerequisite not shipped, no expert available) is different from a value judgment (low ROI, wrong quarter). Label which applies.
+6. **No fabricated inputs.** If Reach or Impact is unknown, flag it as a datum gap and state what you would do to fill it (A/B test, usage query, estimation session). Do not invent a number to close the table.
+7. **RICE and WSJF are not interchangeable per item.** Use RICE for user-facing features with estimable reach. Use WSJF for infra, research, capability-injection, and dependency-only work. Pick one per item; do not average them or switch mid-item.
+---
+## ═══ SCORING FRAMEWORK ═══
+### RICE — for user-facing features and growth initiatives
+| Factor | What it means | Scale |
+|--------|--------------|-------|
+| Reach | Users/customers affected in next 3 months | 1=tens, 2=hundreds, 3=thousands, 4=10K+ |
+| Impact | Per-user value magnitude | 0.25=minimal, 0.5=small, 1=medium, 2=large, 3=massive |
+| Confidence | Certainty in Reach and Impact estimates | 0.5=low, 0.8=medium, 1.0=high |
+| Effort | Person-months including testing, rollout, and debt repayment | decimal months |
+**Formula:** `RICE = (Reach × Impact × Confidence) / Effort`
+Thresholds: > 2.0 = strong; > 4.0 = exceptional; < 0.5 = defer unless regulatory or blocking.
+### WSJF — for infra, research, capability-injection, and dependency work
+| Factor | What it means | Scale |
+|--------|--------------|-------|
+| Value | Business value + user-pain reduction | 1–10 |
+| Time criticality | Cost of delay: what breaks or compounds if we wait a month? | 1–10 |
+| Risk/enablement | Kills a blocker or unblocks multiple downstream items | 1–10 |
+| Effort | Person-weeks | decimal weeks |
+**Formula:** `WSJF = (Value + Time criticality + Risk) / Effort`
+Thresholds: > 4.0/week = strong; > 8.0 = exceptional; < 1.0 = defer unless forced.
+**AI lab / capability-injection override:** Any capability step that gates a benchmarked capability gap (e.g., SGDF E1, DRCT G1, a new LoRA cluster) is treated as WSJF with Time criticality boosted to 9 if no experimental result exists yet — because the cost of not knowing compounds every sprint planning cycle that follows. Flag it explicitly.
+---
+## ═══ PHASE A: INTAKE & ESTIMATION ═══
+**Step 1 — List every candidate item** with: title, one-line description, submitter, date submitted, and category.
+Categories:
+- `feature` — new user-facing capability
+- `bug` — regression or correctness failure
+- `debt` — structural refactor, migration, upgrade
+- `research` — experiment with a binary go/no-go verdict (e.g., "does SGDF enlarge beyond-base?")
+- `capability-injection` — adding a new skill/adapter/LoRA to a running model or harness (follow the 5-step CIP protocol; treat as WSJF)
+- `ops` — deployment, scaling, monitoring
+- `security` — always MUST-DO; score +∞ and label "Gremlin-risk" or "regulatory"
+**Step 2 — Identify all hard dependencies before scoring.**
+For each item: "Requires X to be merged/deployed/green first? Y/N." If Y, list the upstream item. If the upstream item is not already in the backlog, add it.
+Mark items that are upstream of 2 or more others as "blocking wall" — they jump to the front of the sequence regardless of score.
+**Step 3 — Flag confidence gaps.**
+For each item where Reach or Impact is genuinely unknown: write "Datum gap: [what you don't know] — fill via [specific method: usage query, 30-min user interview, A/B test, benchmark run]." If the item's score would swing across the threshold when the gap is filled, run the filler action BEFORE finalizing the waterline.
+For research items: if the experiment has not run, set Confidence = 0.5 until you have a result. Do not assume the hypothesis is true.
+---
+## ═══ PHASE B: SCORING & SEQUENCING ═══
+**Step 4 — Score every item.** Show all factor values and intermediate arithmetic inline. Do not produce a table with only final scores.
+**Step 5 — Sort by score descending.**
+**Step 6 — Re-order for dependencies.** Walk the sorted list top to bottom. For each item with an upstream dependency, check: is the upstream item already above it in the list? If not, move the upstream item immediately before it and annotate: "Promoted: upstream of [item N] and [item M]."
+Exception: if the upstream item scores below 0.5 RICE (or below 1.0 WSJF) and the downstream items score above 2.0, evaluate whether a workaround path exists. If a workaround ships in < 25% of the blocked items' combined effort, take the workaround and defer the upstream item. Document this explicitly.
+**Step 7 — Draw the waterline.** Capacity = N person-weeks (or compute budget). Sum effort from the top of the sorted list until you hit capacity. The waterline falls between the last in-cycle item and the first deferred item. Do not split a single item across cycles unless it is explicitly designed as a phased release.
+---
+## ═══ PHASE C: CYCLE PLAN OUTPUT ═══
+**For each item IN the next cycle, write:**
+- Score and formula
+- Effort and who owns it
+- What it unblocks (if anything)
+- One measurable success criterion: not "improve performance" but "p95 latency < 400ms on the auth endpoint under 200 concurrent users"
+**For each item DEFERRED, write:**
+- Score and formula
+- One-sentence deferral reason (from: score too low / capacity / blocked on X / confidence gap)
+- If score is high but effort is massive: "Revisit [cycle] if we split into phase 1 (scoped to [X]) and phase 2 (scoped to [Y])."
+- If score is uncertain due to confidence gap: "Re-score after [specific action]. If Impact doubles, RICE exceeds threshold and it enters the next cycle."
+**For research items specifically:**
+- State the pre-registered verdict criterion: "This item passes if [measurable outcome]. It fails if [measurable counter-outcome]. Ambiguous results do not count as pass."
+- If the item is an experiment that gates downstream work, name the downstream work: "Unblocks: [list]."
+---
+## ═══ PHASE D: COMMON FAILURE MODES ═══
+**"We don't have numbers for Reach."**
+Check usage analytics first. If unavailable, use a proxy: "If shipped, we expect N% adoption of M active users = K." State the proxy explicitly. If M is also unknown, you have a data infrastructure gap — that is itself a backlog item (WSJF, ops, high risk-reduction score because every future decision is flying blind).
+**"Everything is urgent."**
+Apply the Cost of Delay test: "If we defer this item by 30 days, what specifically breaks or compounds?" If the honest answer is "nothing material," the item is not urgent. Set a hard cap: at most 2 items per cycle can carry a Time criticality score of 9 or 10; if you need more, you are mis-calibrating urgency and your scores will be worthless.
+**"The highest-scoring item is externally blocked."**
+Pull it. Slide the waterline up. The next item in sequence takes its slot. If removing it reveals a new blocking wall (an item that now unblocks the next 3 highest-scoring items), re-run Step 6 for the revised list.
+**"A research item failed — does that move the waterline?"**
+Yes. A failed experiment (binary go/no-go) changes the dependency graph. Re-run Phase B for all items that listed the failed experiment as an upstream dependency. Some may now be unblockable and should be deferred or killed.
+**"A capability-injection step keeps getting deferred."**
+This is a signal that the 5-step CIP protocol is not being followed and scope is being added without going through the injection sequence. Flag it: "Blocked on CIP step [1/2/3/4/5] — not a scoring problem, a process gap." The fix is to complete the protocol step, not to rescore.
+---
+## ═══ EXAMPLE: SCORING A 6-ITEM AI LAB BACKLOG ═══
+**Scenario:** 2 engineer-weeks capacity, one pending capability experiment (SGDF E1), one production bug.
+| # | Item | Type | Framework | Score (shown) | In/Deferred | Note |
+|---|------|------|-----------|---------------|-------------|------|
+| 1 | SGDF E1: run base-fail experiment | research | WSJF | (9+9+8)/0.5=52.0 | **SHIP** | Gates items 3 and 4; datum gap on both until result. |
+| 2 | Memory leak: auth endpoint crashes 4% of sessions | bug | WSJF | (8+10+5)/0.5=46.0 | **SHIP** | High CoD: revenue impact, crashes users mid-session. |
+| 3 | LoRA cluster: finance domain injection | capability | WSJF | (7+4+6)/2=8.5 | **DEFER** | Blocked on SGDF E1 (item 1). Re-sequence after result. |
+| 4 | Serve v10 on SGLang FP8 | infra | WSJF | (8+7+9)/3=8.0 | **DEFER** | High value; effort=3wks exceeds capacity. P1 for next cycle. |
+| 5 | Dark mode | feature | RICE | (3×0.5×0.5)/2=0.375 | **DEFER** | RICE below threshold. Revisit Q3 if 10+ user votes. |
+| 6 | API docs for tool-contract | debt | WSJF | (4+3+6)/0.5=26.0 | **SHIP** | Unblocks 3 external integrators; 0.5-week effort. |
+**Waterline:** SGDF E1 (0.5w) + Bug fix (0.5w) + API docs (0.5w) = 1.5 weeks. 0.5 weeks held for tactical support or E1 follow-up analysis.
+---
+## KEY PRINCIPLE
+Prioritization is math with a transparency obligation. The score is a hypothesis about value-per-effort, not a verdict. Defend the inputs — Reach, Impact, Effort — not the ranking. When data changes (experiment results, a blocker lifted, a competitor move), re-score openly and reshuffle the waterline without apology. Every item in the backlog deserves a score and a sentence. Deferred items are not rejected; they are sequenced. The waterline is the current answer to a continuous question.

package/skills/proof.md ADDED Viewed

@@ -0,0 +1,91 @@
+---
+name: proof
+description: Construct a rigorous proof — restate precisely, choose the strategy, justify every step by name, hunt the failure modes (circularity, edge cases, quantifier slips), then verify it.
+allowed-tools: Read, Write
+---
+## PHASE 0 — RECEIVE & RESTATE (non-negotiable first step)
+1. Write the claim with ALL quantifiers made explicit: ∀, ∃, "for all n ≥ 1", "there exists a unique", etc.
+2. List every hypothesis separately, labeled H1, H2, …
+3. Write every definition you will use: what exactly is "prime"? "connected"? "continuous at x₀"? — no assumed shared vocabulary.
+4. State what the conclusion C must be in the same formal language.
+5. Note the domain: ℕ, ℤ, ℝ, a topological space, a group — it changes what tools are available.
+## PHASE 1 — BUILD INTUITION (before touching the proof)
+6. Try n = 0, 1, 2, 3 (or the smallest cases in the domain). Compute. Write the numbers down.
+7. Ask: why should this be true? Sketch an informal argument in one sentence.
+8. Identify the "load-bearing" hypothesis — which H_i, if dropped, would kill the claim? Try to construct a counterexample with each H_i removed in turn.
+9. If a counterexample appears in step 8, STOP — the claim is false. Report the counterexample and what it reveals.
+10. Draw a picture or diagram if the domain is geometric, combinatorial, or graph-theoretic.
+## PHASE 2 — PICK A STRATEGY
+11. **Direct**: assume H1 ∧ H2 ∧ … and derive C by a chain of implications. Use when the hypothesis hands you the object you need.
+12. **Contrapositive**: prove ¬C → ¬(H1 ∧ … ∧ Hn). Use when C is a negation or when ¬C gives you something to work with.
+13. **Contradiction**: assume H1 ∧ … ∧ ¬C and derive ⊥. Use when C is an existence claim or when ¬C is a strong structural assumption.
+14. **Weak induction**: prove P(base) and P(k) → P(k+1). Use for statements about all n ≥ n₀ with a single-step recursive structure.
+15. **Strong induction**: prove P(base) and (∀j ≤ k, P(j)) → P(k+1). Use when P(k+1) depends on earlier values other than just P(k).
+16. **Well-ordering / extremal**: take the minimal (or maximal) element with a property and derive a contradiction from its existence or from a constructed smaller element. Powerful in ℕ and in finite sets.
+17. **Construction / existence**: exhibit the object explicitly, then verify it satisfies the required properties.
+18. **Pigeonhole**: if n+1 objects go into n boxes, some box gets ≥ 2. Recast the problem in this language when a collision or overlap is the goal.
+19. **Double-counting / bijection**: count the same finite set two ways, or build an explicit bijection between two sets to prove equal cardinality.
+20. **Probabilistic**: define a probability space; show E[X] > 0 (or P(event) > 0) to prove existence. Use when a direct construction is intractable.
+21. **Compactness / limit**: use sequential compactness, Bolzano–Weierstrass, or Heine–Borel when the domain is metric or topological.
+22. Commit to ONE strategy before writing Step 1 of the proof. If it fails at Phase 3, return here and pick the next candidate.
+## PHASE 3 — WRITE THE PROOF (step-by-step, every step justified)
+23. Each line has the form: **[Label]** Statement. **Reason**: [exact theorem / axiom / definition / previous label].
+24. The reason must name the result: not "by algebra" but "by distributivity of multiplication over addition (ring axioms)"; not "clearly" — never write "clearly", "obviously", or "trivially".
+25. Every variable must be introduced before use: "Let x ∈ S be arbitrary", "Fix ε > 0", "Choose N = ⌈1/ε⌉".
+26. For induction: state P(n) explicitly before writing the base case. Verify the base case fully — do not wave at it.
+27. For the inductive step: state the inductive hypothesis (IH) as its own labeled line. Invoke (IH) explicitly in the step that uses it.
+28. For contradiction: open with "Suppose for contradiction that ¬C." Close with "This contradicts [label]. Therefore C."
+29. For existence proofs: after constructing the object, verify EVERY property in the definition — do not stop after construction.
+30. For uniqueness: assume two objects x, y both satisfy the property, then prove x = y.
+31. Keep the proof linear: each step follows only from earlier steps or from the hypotheses. No forward references.
+## PHASE 4 — FAILURE MODE PATROL (run before declaring QED)
+32. **Circular reasoning**: does any step implicitly assume C or a statement equivalent to C? Trace every chain back to the hypotheses.
+33. **Division by zero / invalid operation**: every division, logarithm, inverse, or square root must be preceded by a proof that the denominator/argument is nonzero/positive/invertible.
+34. **Quantifier swap**: ∀x ∃y ≠ ∃y ∀x. Check every place you "pick y depending on x" — is y allowed to depend on x in this context?
+35. **Edge cases**: explicitly test n = 0, n = 1, the empty set, a set of size 1, f = 0, x = boundary of the domain. Does the proof still hold?
+36. **Off-by-one in induction**: does the base case match the first step the inductive step produces? If P(k) → P(k+1) and base is P(0), you get P(0), P(1), P(2), … — is that the full claim?
+37. **Hidden use of choice / continuity / compactness**: if you say "pick an element of S", is S guaranteed nonempty? If you pass a limit inside a function, is the function continuous?
+38. **Strict vs. non-strict inequalities**: a < b and a ≤ b are not interchangeable. Check every inequality sign at every step.
+39. **Hypothesis audit**: list H1, H2, … and ask: where is each one actually used? If H_i never appears, either (a) the proof is incomplete because it should use H_i, or (b) the claim holds without H_i and you have a stronger result — note which.
+## PHASE 5 — VERIFICATION
+40. Read the proof backwards: starting from QED, verify that each step is justified only by what precedes it.
+41. Ask: if hypothesis H_i were false, which step would break? If no step breaks for some H_i, the proof does not actually use H_i — return to step 39.
+42. Ask: does the proof generalize beyond what was claimed? If so, state the stronger result as a remark.
+43. Ask: is there a shorter or more transparent proof? If the proof exceeds ~20 non-trivial steps, look for a lemma to extract.
+## PHASE 6 — CLEAN STATEMENT
+44. Write a self-contained final proof block:
+    - **Theorem**: [exact formal statement with all quantifiers]
+    - **Proof**: [numbered steps, each with reason]
+    - **QED** (or ∎)
+45. The clean proof must be readable without referring back to scratch work.
+## IF THE PROOF FAILS
+46. State clearly: "I cannot prove this claim."
+47. Report WHICH step failed and WHY: what would you need that you don't have?
+48. Attempt a counterexample: choose specific values designed to violate C while satisfying all H_i. If found, verify it satisfies every H_i and violates C explicitly.
+49. If no counterexample is found, report the obstruction: "The proof fails at Step [N] because [missing lemma / unknown fact / possibly false]."
+50. Do not produce a proof with an unverified gap and label it complete.
+## HARD RULES (never violate)
+- Never write "clearly", "obviously", "trivially", or "it is easy to see".
+- Never skip the base case.
+- Never introduce a variable without specifying its domain.
+- Never use a theorem without naming it.
+- Never claim a proof is done until Phase 4 and Phase 5 are complete.
+- If the claim is false, say so immediately with a counterexample. Do not try to prove a false statement.

package/skills/property-underwrite.md ADDED Viewed

@@ -0,0 +1,159 @@
+---
+name: property-underwrite
+description: Full-cycle real estate underwriting playbook — collect deal inputs, state all assumptions explicitly, compute NOI/cap rate/cash-on-cash/DSCR/IRR step by step, run a two-variable sensitivity table, and deliver a conservative go/no-go verdict with annotated kill conditions; invoke whenever a user provides a property address, asking price, rent roll, or deal memo and wants investment analysis.
+allowed-tools: Read, Bash, Write
+---
+EDUCATIONAL DISCLAIMER: This playbook produces analytical models for educational and informational purposes only. It is not licensed investment, tax, or legal advice. All outputs are projections based on stated assumptions and carry no guarantee of accuracy. Consult a licensed real estate broker, CPA, and attorney before making investment decisions.
+═══ HARD RULES ═══
+R1  Never state a market rent, cap rate, or expense ratio as fact without citing the source (user-supplied, broker OM, county assessor, CoStar, etc.). If the source is unknown, label it ASSUMED.
+R2  Every number that feeds a decision metric must be traceable to a prior line in the same analysis — no magic deltas.
+R3  Use only the user's actual financing terms; never fabricate a rate or LTV. If terms are unknown, bracket two scenarios (conservative / aggressive) and flag as unconfirmed.
+R4  Do not round intermediate calculations; round only final outputs to two decimal places for presentation.
+R5  A "go" verdict is always conditional — list the two conditions that would flip it to "no-go."
+R6  Sensitivity table must vary the two highest-impact inputs (typically exit cap rate and rent growth); never substitute correlated inputs as if they are independent.
+R7  IRR must be computed from actual annual cash flows, not annualized from a single-year proxy.
+R8  Flag any assumption that conflicts with the user's local market context (e.g., a 5% vacancy assumption in a 2% vacancy submarket is aggressive — say so explicitly).
+═══ PHASE A — DEAL INTAKE & ASSUMPTION DECLARATION ═══
+A1  Collect all required inputs from the user or source document:
+      • Property type (SFR / small multifamily / commercial / mixed-use)
+      • Purchase price (PP)
+      • Gross scheduled rent (GSR) — unit-by-unit if multifamily; annualized
+      • Existing lease terms, expiry dates, and any below-market tenants
+      • Operating expense detail or owner-provided expense ratio
+      • Debt terms: loan amount (or LTV), interest rate, amortization period, I/O period if any
+      • Proposed hold period (years) and target exit strategy
+A2  Declare every assumption you cannot verify from the inputs, in a table with four columns:
+      | Assumption | Value Used | Basis | Risk Direction |
+      Examples of required rows:
+      - Rent growth rate (annual %)
+      - Stabilized vacancy + credit loss rate (%)
+      - Operating expense ratio (% of EGR) OR line-item budget
+      - CapEx reserve ($/unit/yr or % of GSR)
+      - Exit cap rate (basis points above / below entry cap)
+      - Closing costs (% of PP) and disposition costs (% of exit price)
+      - Loan origination fee (% of loan)
+A3  State the single most market-sensitive assumption and explain why — this becomes a sensitivity axis in Phase E.
+═══ PHASE B — INCOME & EXPENSE RECONSTRUCTION ═══
+B1  Gross Scheduled Rent (GSR)
+      = sum of all contracted rents annualized to 12 months
+      Note any units that are vacant, owner-occupied, or on month-to-month leases.
+B2  Effective Gross Revenue (EGR)
+      EGR = GSR × (1 − vacancy rate) − credit loss
+      Vacancy rate = ASSUMED [value]%; credit loss = ASSUMED [value]%
+B3  Operating Expenses (OpEx)
+      Preferred: line-item reconstruction —
+        Property taxes | Insurance | Property management (% of EGR) |
+        Repairs & maintenance | CapEx reserve | Utilities (landlord-paid) |
+        Landscaping / snow / other recurring | Administrative
+      If only an expense ratio is available: OpEx = EGR × expense_ratio
+      Always exclude: debt service, depreciation, income tax, owner's time.
+B4  Net Operating Income (NOI)
+      NOI = EGR − OpEx
+      Present as a clean two-line summary: EGR → minus OpEx → equals NOI.
+═══ PHASE C — ENTRY METRICS ═══
+C1  Going-in Cap Rate
+      Cap Rate = NOI ÷ Purchase Price
+      Compare to local market cap rate range (cite source or label ASSUMED).
+      A going-in cap below market = you are paying a premium; justify or flag.
+C2  Gross Rent Multiplier (sanity check only)
+      GRM = PP ÷ GSR
+      Use only as a quick plausibility filter vs. comparable sales — never as a decision metric.
+C3  Debt Service
+      Annual Debt Service (ADS) = monthly payment × 12
+      Monthly payment = loan × [r(1+r)^n / ((1+r)^n − 1)] where r = monthly rate, n = amortization months
+      Show the formula with numbers substituted.
+C4  Debt Service Coverage Ratio (DSCR)
+      DSCR = NOI ÷ ADS
+      Lender floor is typically 1.20–1.25x for agency product; state the applicable threshold.
+      If DSCR < 1.00, the property does not cover its debt from operations — hard flag.
+C5  Cash-on-Cash Return (Year 1)
+      Equity invested = PP − loan amount + closing costs + immediate CapEx funded at close
+      Annual pre-tax cash flow = NOI − ADS
+      CoC = pre-tax cash flow ÷ equity invested
+      Flag if CoC < 0 (negative leverage): the debt cost exceeds the cap rate.
+═══ PHASE D — HOLD PERIOD PROJECTION & IRR ═══
+D1  Build a year-by-year cash flow table for the full hold period (columns: Year | GSR | Vacancy | EGR | OpEx | NOI | ADS | Pre-Tax CF | Cumulative CF).
+      Apply rent growth rate compounding annually: GSR_t = GSR_0 × (1 + g)^t
+      Apply expense growth independently (typically 2–3% or CPI-linked — state which).
+D2  Exit Value
+      Exit NOI = NOI in the year of sale (use stabilized trailing-12 at exit, not forward-12 unless exit-cap convention differs — state which you used)
+      Exit price = Exit NOI ÷ exit cap rate
+      Net sale proceeds = Exit price − disposition costs − outstanding loan balance
+D3  IRR Computation
+      Cash flow stream: [−equity_invested, CF_yr1, CF_yr2, …, CF_yr(n−1), CF_yr(n) + net_sale_proceeds]
+      Solve: 0 = Σ [CF_t ÷ (1 + IRR)^t] for t = 0 to n
+      Show the cash flow stream explicitly before solving. Use Bash to compute if numeric precision matters.
+      Target IRR benchmark: state the user's hurdle rate or, if unstated, flag that no hurdle was provided.
+D4  Equity Multiple
+      EM = (sum of all distributions + net sale proceeds) ÷ equity invested
+      Present alongside IRR — they are not interchangeable (IRR is time-weighted; EM is magnitude).
+═══ PHASE E — SENSITIVITY ANALYSIS ═══
+E1  Identify the two dominant value drivers (default: exit cap rate and rent growth rate; substitute only if a different pair demonstrably has higher variance for this deal type — explain why).
+E2  Build a 5×5 sensitivity table for IRR:
+      Exit Cap Rate →   [−100bps] [−50bps] [Base] [+50bps] [+100bps]
+      Rent Growth ↓
+        Base − 1.5%
+        Base − 0.75%
+        Base
+        Base + 0.75%
+        Base + 1.5%
+      Compute each cell's IRR from the same model. Use Bash arithmetic if needed.
+E3  Identify the "loss zone" — cells where IRR falls below the hurdle rate or pre-tax CF turns negative. Highlight in output.
+E4  State the breakeven exit cap rate (cap rate at which net sale proceeds = 0 above loan payoff) to give a margin-of-safety figure.
+═══ PHASE F — GO / NO-GO VERDICT ═══
+F1  Score the deal against five criteria (pass / conditional / fail):
+      1. DSCR ≥ 1.20 (or lender minimum)
+      2. Cash-on-cash ≥ [user hurdle] or ≥ 5% if unstated
+      3. Going-in cap rate within 50bps of market (not a premium without identifiable upside catalyst)
+      4. IRR ≥ hurdle rate across base and at least one adverse sensitivity scenario
+      5. Vacancy assumption is conservative relative to submarket trailing-12 data
+F2  Verdict: GO / CONDITIONAL GO / NO-GO
+      GO: all five criteria pass.
+      CONDITIONAL GO: three or four pass — state the specific condition that must be satisfied before proceeding (e.g., "renegotiate purchase price to $X to achieve DSCR 1.22").
+      NO-GO: two or more fail — state which and why price renegotiation or deal restructure cannot save it.
+F3  Two conditions that flip verdict:
+      Even on a GO, state: "This becomes NO-GO if [condition 1] or [condition 2]."
+      Example conditions: exit cap expands >75bps beyond base; vacancy stabilizes above X%; interest rate rises >50bps at refi.
+F4  Recommended next steps (due diligence items that could change a material assumption):
+      — Physical inspection / PCA to confirm CapEx reserve adequacy
+      — Rent roll audit and tenant estoppels
+      — Local comp rent survey (past 6 months, within 0.5 mi radius, same asset class)
+      — Title search for easements / liens not reflected in asking price
+      — Confirmation of actual property tax basis post-sale (many jurisdictions reassess at transfer)
+KEY PRINCIPLE: Underwriting is a structured argument about which risks you are being paid to take — the numbers are only credible if every assumption is named, sourced, and stress-tested.