npm - @vextlabs/theron-cli - Versions diffs - 0.2.1 → 0.4.0 - Mend

@vextlabs/theron-cli 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (191) hide show

package/dist/api.d.ts +8 -0
package/dist/api.js +3 -0
package/dist/api.js.map +1 -1
package/dist/auth.js +51 -1
package/dist/auth.js.map +1 -1
package/dist/banner.js +3 -2
package/dist/banner.js.map +1 -1
package/dist/checkpoints.d.ts +32 -0
package/dist/checkpoints.js +61 -0
package/dist/checkpoints.js.map +1 -0
package/dist/index.js +61 -5
package/dist/index.js.map +1 -1
package/dist/input.d.ts +61 -0
package/dist/input.js +574 -0
package/dist/input.js.map +1 -0
package/dist/profiles/index.js +5 -0
package/dist/profiles/index.js.map +1 -1
package/dist/profiles/methodologies/build_domains.d.ts +6 -0
package/dist/profiles/methodologies/build_domains.js +170 -0
package/dist/profiles/methodologies/build_domains.js.map +1 -0
package/dist/profiles/methodologies/operate_domains.d.ts +8 -0
package/dist/profiles/methodologies/operate_domains.js +1239 -0
package/dist/profiles/methodologies/operate_domains.js.map +1 -0
package/dist/profiles/methodologies/regulated_domains.d.ts +6 -0
package/dist/profiles/methodologies/regulated_domains.js +153 -0
package/dist/profiles/methodologies/regulated_domains.js.map +1 -0
package/dist/profiles/methodologies/research_domains.d.ts +8 -0
package/dist/profiles/methodologies/research_domains.js +179 -0
package/dist/profiles/methodologies/research_domains.js.map +1 -0
package/dist/profiles/methodologies/strategy_domains.d.ts +15 -0
package/dist/profiles/methodologies/strategy_domains.js +193 -0
package/dist/profiles/methodologies/strategy_domains.js.map +1 -0
package/dist/profiles/seeds.js +241 -95
package/dist/profiles/seeds.js.map +1 -1
package/dist/receipt.d.ts +17 -0
package/dist/receipt.js +46 -0
package/dist/receipt.js.map +1 -0
package/dist/render.d.ts +4 -1
package/dist/render.js +95 -28
package/dist/render.js.map +1 -1
package/dist/repl.d.ts +8 -1
package/dist/repl.js +420 -62
package/dist/repl.js.map +1 -1
package/dist/sessions.d.ts +14 -0
package/dist/sessions.js +100 -0
package/dist/sessions.js.map +1 -1
package/dist/ship.d.ts +2 -0
package/dist/ship.js +62 -0
package/dist/ship.js.map +1 -0
package/dist/skills/catalog.d.ts +13 -0
package/dist/skills/catalog.js +86 -0
package/dist/skills/catalog.js.map +1 -0
package/dist/tools/bash.js +81 -14
package/dist/tools/bash.js.map +1 -1
package/dist/tools/edit.js +21 -1
package/dist/tools/edit.js.map +1 -1
package/dist/tools/glob.js +4 -1
package/dist/tools/glob.js.map +1 -1
package/dist/tools/grep.d.ts +5 -0
package/dist/tools/grep.js +101 -2
package/dist/tools/grep.js.map +1 -1
package/dist/tools/index.d.ts +22 -0
package/dist/tools/index.js +177 -41
package/dist/tools/index.js.map +1 -1
package/dist/tools/ls.d.ts +3 -0
package/dist/tools/ls.js +23 -12
package/dist/tools/ls.js.map +1 -1
package/dist/tools/multiedit.d.ts +12 -0
package/dist/tools/multiedit.js +79 -0
package/dist/tools/multiedit.js.map +1 -0
package/dist/tools/stoa.d.ts +1 -1
package/dist/tools/stoa.js +7 -3
package/dist/tools/stoa.js.map +1 -1
package/dist/tools/task.d.ts +9 -0
package/dist/tools/task.js +166 -0
package/dist/tools/task.js.map +1 -0
package/dist/tools/todowrite.d.ts +12 -0
package/dist/tools/todowrite.js +38 -0
package/dist/tools/todowrite.js.map +1 -0
package/dist/tools/webfetch.d.ts +6 -0
package/dist/tools/webfetch.js +98 -0
package/dist/tools/webfetch.js.map +1 -0
package/dist/tools/websearch.d.ts +7 -0
package/dist/tools/websearch.js +83 -0
package/dist/tools/websearch.js.map +1 -0
package/dist/tools/write.js +17 -1
package/dist/tools/write.js.map +1 -1
package/dist/verifiers/calc_gate.d.ts +2 -0
package/dist/verifiers/calc_gate.js +112 -0
package/dist/verifiers/calc_gate.js.map +1 -0
package/dist/verifiers/citation_gate.d.ts +2 -0
package/dist/verifiers/citation_gate.js +130 -0
package/dist/verifiers/citation_gate.js.map +1 -0
package/dist/verifiers/confidence_marked.d.ts +2 -0
package/dist/verifiers/confidence_marked.js +49 -0
package/dist/verifiers/confidence_marked.js.map +1 -0
package/dist/verifiers/disclaimer_gate.d.ts +2 -0
package/dist/verifiers/disclaimer_gate.js +57 -0
package/dist/verifiers/disclaimer_gate.js.map +1 -0
package/dist/verifiers/evidence_gate.d.ts +2 -0
package/dist/verifiers/evidence_gate.js +108 -0
package/dist/verifiers/evidence_gate.js.map +1 -0
package/dist/verifiers/index.d.ts +5 -0
package/dist/verifiers/index.js +28 -7
package/dist/verifiers/index.js.map +1 -1
package/dist/verifiers/lint.js +4 -3
package/dist/verifiers/lint.js.map +1 -1
package/dist/verifiers/promoted_kernels.d.ts +8 -0
package/dist/verifiers/promoted_kernels.js +190 -0
package/dist/verifiers/promoted_kernels.js.map +1 -0
package/dist/verifiers/source_gate.d.ts +2 -0
package/dist/verifiers/source_gate.js +125 -0
package/dist/verifiers/source_gate.js.map +1 -0
package/dist/verifiers/test_smoke.js +30 -0
package/dist/verifiers/test_smoke.js.map +1 -1
package/dist/verifiers/types.d.ts +3 -0
package/package.json +4 -2
package/skills/README.md +123 -0
package/skills/ab-test.md +89 -0
package/skills/api-design.md +175 -0
package/skills/architecture-design.md +185 -0
package/skills/business-case.md +77 -0
package/skills/causal-inference.md +77 -0
package/skills/clinical-guideline.md +98 -0
package/skills/code-review.md +98 -0
package/skills/cold-outreach.md +268 -0
package/skills/competitive-teardown.md +223 -0
package/skills/component-spec.md +121 -0
package/skills/content-calendar.md +280 -0
package/skills/contract-review.md +155 -0
package/skills/data-analysis.md +187 -0
package/skills/debug.md +91 -0
package/skills/design-audit.md +121 -0
package/skills/differential-diagnosis.md +79 -0
package/skills/discovery-call.md +206 -0
package/skills/edit-pass.md +80 -0
package/skills/engineering-calc.md +101 -0
package/skills/estimate.md +70 -0
package/skills/experiment-design.md +105 -0
package/skills/fact-check.md +82 -0
package/skills/financial-model.md +104 -0
package/skills/grant-proposal.md +93 -0
package/skills/harmony-analysis.md +93 -0
package/skills/hypothesis-generation.md +99 -0
package/skills/incident-response.md +134 -0
package/skills/interview-loop.md +62 -0
package/skills/job-scorecard.md +92 -0
package/skills/kb-article.md +174 -0
package/skills/launch-plan.md +85 -0
package/skills/lease-review.md +93 -0
package/skills/lesson-plan.md +198 -0
package/skills/literature-review.md +69 -0
package/skills/market-entry.md +137 -0
package/skills/market-sizing.md +159 -0
package/skills/meta-analysis.md +140 -0
package/skills/migrate.md +117 -0
package/skills/optimize.md +88 -0
package/skills/options-strategy.md +166 -0
package/skills/peer-review.md +96 -0
package/skills/pentest-plan.md +193 -0
package/skills/pitch-review.md +132 -0
package/skills/plan.md +88 -0
package/skills/policy-brief.md +124 -0
package/skills/positioning.md +192 -0
package/skills/postmortem.md +168 -0
package/skills/prd.md +105 -0
package/skills/prioritize.md +162 -0
package/skills/proof.md +91 -0
package/skills/property-underwrite.md +159 -0
package/skills/recipe-develop.md +109 -0
package/skills/red-team.md +142 -0
package/skills/refactor.md +58 -0
package/skills/reflection-session.md +115 -0
package/skills/regulatory-compliance.md +136 -0
package/skills/reproduce.md +87 -0
package/skills/runbook.md +344 -0
package/skills/security-audit.md +154 -0
package/skills/seo-brief.md +201 -0
package/skills/sql-query.md +161 -0
package/skills/story-craft.md +163 -0
package/skills/tdd.md +59 -0
package/skills/term-sheet.md +298 -0
package/skills/theory-of-change.md +88 -0
package/skills/threat-model.md +104 -0
package/skills/ticket-triage.md +200 -0
package/skills/tolerance-analysis.md +149 -0
package/skills/training-program.md +151 -0
package/skills/translate.md +64 -0
package/skills/unit-economics.md +238 -0
package/skills/valuation.md +112 -0
package/skills/write-tests.md +77 -0

package/skills/discovery-call.md ADDED Viewed

@@ -0,0 +1,206 @@
+---
+name: discovery-call
+description: Design and run an enterprise B2B discovery call end-to-end — intent-setting open, current-state diagnosis with quantified impact, MEDDPICC qualification, trap and tell-me-more follow-ups, and a mutual next step — diagnosing before pitching and never prescribing before the pain is owned by the buyer.
+allowed-tools: Read, Write
+---
+═══ HARD RULES ═══
+1. NEVER pitch a capability before the buyer has articulated the pain in their own words and quantified (or estimated) its business impact — prescribing before diagnosing is the single most common AE failure mode.
+2. NEVER accept a surface answer as final — every first response to a pain question is a placeholder; the real answer lives one "tell me more" or "what does that cost you?" underneath.
+3. NEVER fabricate numbers the prospect has not stated — if they won't quantify impact, estimate collaboratively ("is it closer to $50K or $500K a quarter?") and label the result "your words, my math — correct me."
+4. NEVER conflate INTEREST (they asked questions, seemed engaged) with QUALIFICATION (Metrics, Economic Buyer, Decision criteria, Decision process, Paper process, Identified pain, Champion are all confirmed) — update MEDDPICC explicitly at the end of every call.
+5. NEVER close a discovery call without a specific, date-anchored mutual next step that the champion agrees to own — "let's stay in touch" is a loss disguised as a maybe.
+6. NEVER take notes in real time while the prospect is talking — capture cues (exact phrases, numbers, named stakeholders, emotion words), not summaries; summaries lose the language you will mirror back.
+7. If the call runs short of a full MEDDPICC read, flag which letters are BLANK and name the specific question that would fill each before the next interaction.
+═══ PHASE A — PRE-CALL RESEARCH (async, before the meeting) ═══
+A1. Map the account:
+    - Company: stage (public / PE-backed / VC / bootstrapped), headcount band, revenue band (public filing or estimate), industry vertical, geographies.
+    - Buyer org: the named prospect's title and reporting line; who owns the budget for this category; who owns the process it touches.
+    - Recent signals: press releases, earnings calls, job postings (job posts = roadmap intent; headcount changes = budget pressure or growth mode), LinkedIn activity, G2 reviews mentioning competitors or pain.
+    - Existing relationship: prior deals, expansions, losses, support tickets, NPS score if available.
+A2. Form a hypothesis — do NOT treat this as the answer; treat it as the question you are testing:
+    - Hypothesis pain: [the specific operational or financial problem you believe they have, based on research]
+    - Hypothesis impact: [the estimated cost of that problem — revenue risk, headcount cost, compliance exposure, speed loss]
+    - Hypothesis trigger: [why they are looking now — what changed in the last 3–12 months]
+    Write these three lines explicitly before the call. You will test all three in Phase C.
+A3. Identify the three stakeholders most likely to be ABSENT from this first call but who will influence or kill the deal:
+    - Economic Buyer (EB): who signs the check; what their financial priorities are this quarter.
+    - Technical Blocker: who will raise the security, integration, or architecture objection.
+    - Champion risk: does the person you are meeting have enough internal credibility and motivation to carry this internally?
+═══ PHASE B — CALL OPEN: INTENT-SETTING (first 3–5 minutes) ═══
+B1. State the agenda explicitly and ask permission to adjust it:
+    "I've blocked [time]. My plan is to spend most of it understanding your world — where you're feeling the most friction around [topic], what's driving the urgency now, and what a good outcome looks like for you. Then if it makes sense, I can share how we've approached similar problems for teams like yours. Does that order work, or is there something you want to front-load?"
+    — This move signals you are not there to pitch; it also surfaces hidden agendas immediately.
+B2. Calibrate their prior experience with you or the category:
+    "Before we get in — how familiar are you with what we do? I want to make sure I don't over-explain something you already know, or skip something you'd want context on."
+    — Three response types and their meanings:
+      "I know you well" → they did research; skip vendor education, go straight to differentiation.
+      "I've heard the name" → 60-second positioning anchor only; return to discovery.
+      "I honestly don't know much" → 90-second category framing; test if they are the right call-level.
+B3. Time-box the agenda honestly:
+    "I want to make sure we save [10–15] minutes at the end to talk about what makes sense as a next step, so I'll keep an eye on the clock. Stop me anytime."
+═══ PHASE C — CURRENT STATE DIAGNOSIS (the core; 20–30 minutes) ═══
+C1. Open the diagnosis with a situational anchor, not a product question:
+    "Walk me through how your team handles [the process this product touches] today — from the moment [trigger event] happens to when [outcome] is complete."
+    — You are mapping the workflow, not probing for pain yet. Listen for handoff points, named tools, named teams, time durations, and frustration language ("it's a nightmare," "we have to chase," "we just accepted that").
+C2. Identify the gap between current state and required state — use their language:
+    "You mentioned [exact phrase they used]. What does it look like when that goes wrong?"
+    "How often does that happen?"
+    "When it does — who feels it first, and who has to fix it?"
+C3. Quantify the impact — this is non-negotiable; incomplete MEDDPICC blocks the deal:
+    Pain-to-number sequence (use exactly this order):
+      Step 1 — Scope: "How many people / deals / customers / hours does this touch per [week/month/quarter]?"
+      Step 2 — Cost per unit: "When this breaks down, what does it take to fix it? Time, headcount, rework?"
+      Step 3 — Frequency: "How many times does that happen in a typical month?"
+      Step 4 — Reflect the math back: "So roughly [N] times × [hours] × [cost/hour] — am I in the right ballpark, or am I off?"
+      Step 5 — Broaden if narrow: "Is that the whole picture, or are there downstream effects — lost deals, churn risk, compliance exposure?"
+    You need a number the buyer owns, not one you invented. If they resist: "I'm not going to hold you to it — I just want to make sure when we build a business case together, it's anchored to something real for your team."
+C4. Test urgency — determine whether the pain is burning or background:
+    "If nothing changes in the next six months, what does that look like for your team / your number / your customers?"
+    "Has there been a specific event — a miss, an audit finding, a customer escalation — that moved this from background to priority?"
+    — Prospect that cannot answer this with a specific event does not have an active buying initiative; adjust qualification accordingly.
+C5. Surface the implicit cost of inaction explicitly:
+    "It sounds like the current approach is costing you [X] in [metric]. Is that a number leadership sees, or does it stay inside the team?"
+    — If leadership doesn't see the number, the champion cannot build an internal case. This is a champion-strength signal.
+═══ PHASE D — MEDDPICC QUALIFICATION ═══
+Fill one row per letter. BLANK means you do not yet know — do not guess.
+D1. METRICS — What does success look like in a number the buyer's leadership will recognize?
+    Ask: "If we solve this perfectly, what does your team's [KPI — cycle time, cost, error rate, revenue] look like in 90 days? What would your manager call a win?"
+    Required: at least one hard metric the EB cares about (not a feature metric the champion likes).
+D2. ECONOMIC BUYER — Who owns the budget and signs the contract?
+    Ask: "When your team has bought software in this range before, who was the final signature?"
+    Trap: "Is that something you'd shepherd through, or does it land on someone else's desk?"
+    Warning sign: champion says "I handle it" but their title is below director for a five-figure deal.
+D3. DECISION CRITERIA — What does "the right solution" look like to every stakeholder who will evaluate this?
+    Ask: "Beyond solving [pain], what are the boxes a solution needs to check to get through your process? Security review? Integration requirements? Specific ROI threshold?"
+    Ask: "Are there criteria that would make us a non-starter — things we'd need to solve before you'd move forward?"
+D4. DECISION PROCESS — How does a purchase like this actually move from "we want it" to "we signed"?
+    Ask: "Walk me through how your team made the last [comparable] decision — who was in the room, what milestones did it go through, how long did it take?"
+    Ask: "Is there a formal vendor evaluation or RFP process, or is this more of a champion-led recommendation?"
+    Map: Evaluation → Business case → EB approval → Legal/security → Procurement → Signature. Which steps exist? Who owns each?
+D5. PAPER PROCESS — What is the mechanics of the contract?
+    Ask: "Once a decision is made, what's the typical cycle to get a contract signed? Is there a standard MSA, or does your legal team start from scratch?"
+    Ask: "Are there purchasing thresholds that change the approval chain — for example, does anything above $X go to a different committee?"
+D6. IDENTIFIED PAIN — Has the champion articulated the pain in terms that map to a metric, a stakeholder, and a timeline?
+    Check: Does the champion own this pain personally (their number is affected) or are they a proxy for someone else's pain?
+    Check: Is the pain specific enough to build a business case around, or is it a vague "efficiency problem"?
+D7. CHAMPION — Who will sell this internally when you are not in the room?
+    Ask: "When you bring something like this to your leadership team, what's your typical playbook? Do you usually build a business case, or is it more of a demo + recommendation?"
+    Test: "If I sent you the numbers we just worked out in a one-pager, would that be useful to share internally — and who would you share it with?"
+    Champion quality markers: knows the EB personally, has won internal budget before, has a personal stake in the outcome, will send the one-pager without being asked twice.
+D8. COMPETITION — Who else are they evaluating, and what is driving their evaluation criteria?
+    Ask: "Are you looking at this in isolation, or are there other approaches on the table — including doing nothing or building internally?"
+    Ask: "What prompted you to look at us specifically — did something come up, or are we part of a broader evaluation?"
+    Trap: "If you had to rank the alternatives right now, where would we land — and what would move us up or down?"
+═══ PHASE E — TRAP QUESTIONS AND TELL-ME-MORE FOLLOWS ═══
+E1. The Iceberg Trap — surfaces the real problem under the stated problem:
+    Use when the stated pain seems surface-level or rehearsed:
+    "That's the symptom — what do you think is actually causing it? If you could wave a wand and fix the root issue, what would you fix?"
+E2. The Ownership Trap — reveals whether the champion has skin in the game:
+    "Is this primarily a problem that lands on your plate, or are you solving it for someone else on the team?"
+    If for someone else: "And that person — are they bought in on solving it, or would they need to be convinced there's a problem first?"
+E3. The Timeline Trap — distinguishes a funded initiative from an exploration:
+    "Is there a point in the year where this either gets solved or gets pushed to next year? What's driving that date?"
+    If no date: "What would need to happen internally for this to become a Q[X] priority?"
+E4. The Tell-Me-More Follow — never accept the first answer to a pain question:
+    Pattern: [Surface answer] → "Tell me more about that." → [Elaboration] → "And when that happens, what does it cost you?" → [Number or range] → "Is that the full picture?"
+    Use exactly three times per call on the pain you will ultimately build the business case around.
+E5. The Reversal — reframes the cost of inaction as the cost of delay:
+    "You mentioned you've been dealing with this for [duration]. If you'd solved this [duration] ago, what would that have been worth to the business?"
+    — Forces the prospect to compute the cost of delay without you asserting it.
+E6. The Stakeholder Reveal — surfaces hidden influencers:
+    "When this comes up in your leadership meetings, who usually has the strongest opinion about how to handle it — either for or against making a change?"
+    — Names that emerge here are MEDDPICC stakeholders you need a plan for.
+═══ PHASE F — THE PIVOT TO SOLUTION (only after C and D are complete) ═══
+F1. Earn the right to share before sharing:
+    "Based on what you've described — [echo back their pain in their words, with their number] — I want to share how we've approached this for teams in a similar spot. Is that useful, or do you have more you want to cover first?"
+    — Never pivot without explicit permission. Jumping to pitch without permission signals you stopped listening.
+F2. Prove relevance before claiming capability:
+    Lead with a customer story in the format: [Similar company profile] had [their pain mirrored back] and [specific outcome with a number], not with a feature list.
+    Only then: "The way we do that is [mechanism] — which specifically addresses [their stated root cause]."
+F3. Test fit explicitly:
+    "Does that match the shape of what you're dealing with, or is there a meaningful difference I should know about?"
+    — Listen for the difference. That difference is usually the real objection.
+F4. Handle the capability gap honestly:
+    If they name a requirement you do not meet: "That's a fair ask. We [don't do that today / do it differently / have it on the roadmap]. Here's how teams in your situation have thought about that tradeoff: [specific framing]. Does that work for you, or is it a blocker?"
+    Never minimize a gap; never promise unshipped capability; never say "we're working on it" without a date.
+═══ PHASE G — MUTUAL NEXT STEP ═══
+G1. Summarize what was heard before proposing a next step:
+    "Before we close — let me reflect back what I heard, and you tell me if I got it right."
+    Recap in this order: their pain (their words) → quantified impact (their number) → their timeline → who else is involved → what they said success looks like.
+    Ask: "Is that an accurate picture, or did I miss something?"
+G2. Propose the specific next step — it must be:
+    - Named (not "a follow-up call" — what kind? demo? business case review? stakeholder meeting?)
+    - Dated (specific calendar date, not "next week sometime")
+    - Owned by the champion (they agree to do something before that meeting — internal conversation, pulling a number, looping in the EB)
+    Template: "Based on what you shared, I'd suggest [specific meeting type] on [date] with [attendees]. I'll [what you will prepare]. Before then, would it be possible for you to [champion action]? Does that make sense?"
+G3. Test commitment quality:
+    "On a scale of 1–10, how likely is it that we'll actually meet on [date], and what would push that toward a 10?"
+    — A 7 or below requires a direct conversation: "What's in the way? I'd rather know now than find out the day before."
+G4. Confirm the path to the Economic Buyer:
+    "If this meeting goes well, what's the logical next step from there — would it make sense to loop in [EB name / role] at some point, or is that premature?"
+    — If the answer is "never" or "I'll handle it," flag as a champion-risk signal.
+═══ PHASE H — POST-CALL DEBRIEF ═══
+H1. Update MEDDPICC within 30 minutes while the call is fresh — use this exact format:
+    M — [number they stated / BLANK]
+    E — [name and title of EB / BLANK]
+    D (criteria) — [list of stated must-haves / BLANK]
+    D (process) — [steps and owners mapped / BLANK]
+    P — [paper process and approval chain / BLANK]
+    I — [pain statement + metric + owner / BLANK]
+    C — [champion name, strength rating: STRONG / DEVELOPING / WEAK]
+    C (competition) — [named alternatives / BLANK]
+H2. Score the deal against three gates before forecasting:
+    GATE 1 — PAIN OWNED: Did the buyer articulate the pain in their own words with a metric? Y / N
+    GATE 2 — CHAMPION STRENGTH: Does the champion have personal stake, EB access, and a willingness to carry the deal internally? Y / N / PARTIAL
+    GATE 3 — ACTIVE INITIATIVE: Is there a funded timeline with a decision owner, or is this exploratory? Y / N
+    All three gates must be Y to move this to a qualified pipeline stage.
+H3. Define the ONE thing you must learn before the next meeting — the question left unanswered that, if answered wrong, kills the deal. Write it as a literal question sentence.
+KEY PRINCIPLE: Diagnosis is not a phase before the pitch — it is the pitch; the buyer who teaches you their problem has already started selling themselves.

package/skills/edit-pass.md ADDED Viewed

@@ -0,0 +1,80 @@
+---
+name: edit-pass
+description: Run a senior seven-phase editorial pass on prose or docs — macro structure first, then clarity, concision, grammar, rhythm, and a signed change log — killing AI-isms/hedging/em-dash abuse while preserving the author's voice and never altering meaning.
+allowed-tools: Read, Edit, Write
+---
+═══ HARD RULES ═══
+1. Voice is sacred. Your job is to make the author sound like a sharper version of themselves, not like you. If a sentence is already strong, leave it alone.
+2. Never alter meaning. When in doubt between two readings, choose the one that matches the author's evident intent. If intent is ambiguous, flag it with [AMBIGUOUS] and leave the original.
+3. Fix in order — structure first, then clarity, then concision, then grammar, then rhythm. Do not tighten a sentence that lives in the wrong section; move it first.
+4. One logical change per Edit call. Do not batch a move + a rewrite + a word swap into a single call — the diff becomes unreadable and compounding errors compound silently.
+5. Track every substantive change. Produce a CHANGE LOG (Phase G) grouped by phase. Minor punctuation fixes do not need entries; any change affecting meaning, emphasis, or structure does.
+6. Do not add content. You are an editor, not a co-author. No new paragraphs, no inserted examples, no elaboration unless the author has left a visible [TODO] or [EXPAND] marker.
+7. Do not remove content unless it is a true duplicate, a pure filler phrase, or a confirmed AI-ism. Deletions that cut a genuine idea require a [DELETED — CONFIRM] flag so the author can review before accepting.
+═══ PHASE A — READ AND MAP (do this before touching a single word) ═══
+8. Read the entire document from top to bottom with Read. Do not edit yet.
+9. Identify the document type and apply the genre's editorial standard accordingly:
+   - **Essay/pitch**: the argument must move — each paragraph earns its place by advancing the claim, not restating it; the lede is non-negotiable.
+   - **Technical doc/reference**: precision outranks elegance; every term must be defined on first use and used consistently thereafter; passive voice is acceptable when the agent is irrelevant.
+   - **Narrative**: sentence rhythm and scene transition carry meaning — do not "fix" a fragment or a one-word paragraph that is doing intentional work.
+   - **Mixed**: identify which sections belong to which genre and apply the right standard to each section, not a single averaged standard to the whole.
+10. Map the argument or information skeleton: what is the ONE thing this piece is trying to accomplish? Write it in one sentence internally. If you cannot, that is the primary structural problem — flag it before editing anything.
+11. Mark every section, heading, or paragraph with one of: CORE (load-bearing), SUPPORT (illustrates or qualifies CORE), REDUNDANT (repeats something already said), MISPLACED (right content, wrong location), WEAK (vague or filler). This markup is internal only — do not write it into the file.
+12. Note the author's natural rhythm from the first few paragraphs they wrote without apparent effort. Sentence length, punctuation density, whether they favor fragments or compound sentences — this is the voice signature you must preserve throughout.
+═══ PHASE B — STRUCTURE (macro surgery before micro polish) ═══
+13. Reorder sections that are MISPLACED. Move the section; do not rewrite it in its new location. A well-written paragraph in the wrong place is still wrong.
+14. Promote any buried lede. If the most important claim appears in paragraph 4 or later, move it to paragraph 1 or 2 unless the piece is deliberately building tension toward a reveal.
+15. Remove REDUNDANT blocks. Before deleting, confirm the same information is not serving a different rhetorical purpose (setup, callback, emphasis by repetition). If deletion removes the only instance of an idea, use [DELETED — CONFIRM].
+16. Check that every section heading accurately labels what follows. If the body drifts from the heading, fix the heading — or, if the drift is intentional development, remove the heading and fold the content into the preceding section.
+17. Verify the ending earns its position. A conclusion that merely summarizes without adding resolution or forward motion should be cut to its essential sentence or replaced with a direct close.
+═══ PHASE C — CLARITY (sentence-level meaning must be unambiguous) ═══
+18. Resolve every pronoun whose antecedent is more than one sentence back or is ambiguous. Replace with the noun.
+19. Rewrite passive constructions where the agent matters. "The report was filed" is fine when the agent is irrelevant. "Mistakes were made" is not — name who made them.
+20. Break any sentence that requires two re-reads to parse. Split at the natural clause boundary. Do not merge the resulting sentences if they read as clipped — that may be exactly right for the voice.
+21. Replace every nominalization (a verb turned into a noun phrase) with the verb it came from. "Make a determination" → "determine." "Provide an explanation" → "explain." "Have a discussion" → "discuss." "Conduct an analysis" → "analyze."
+22. Flatten nested qualifications. "It is possible that in some cases there may be a tendency for X" → "X sometimes happens." Keep qualifications only when they carry real epistemic weight — replace vague hedges with precise ones ("in 3 of 4 cases," "under load above 10k req/s").
+23. Confirm that every technical term is used consistently across the document. If the author uses two names for the same concept, pick the one used most or the more precise one and apply it globally — note the change in the CHANGE LOG.
+═══ PHASE D — CONCISION (cut without losing signal) ═══
+24. Kill filler openers on sight: "It is important to note that," "As we can see," "In order to," "Due to the fact that," "It should be mentioned that," "First and foremost." Delete the opener; the sentence starts at the real subject.
+25. Kill hedge stacks: "somewhat," "rather," "quite," "a bit," "in a sense," "sort of," "to some extent" — remove unless the author is deliberately signaling calibrated uncertainty, in which case replace with a precise qualifier.
+26. Kill em-dashes used as sentence extenders. An em-dash that could be a period or a colon should become one. An em-dash that creates a true parenthetical may stay — but audit every instance; most documents have three times as many em-dashes as they need.
+27. Kill AI-isms on sight: "delve into," "dive deep," "unpack," "nuanced" (when it means "complicated"), "it's worth noting," "let's explore," "robust," "leverage" (when "use" works), "harness," "seamless," "transformative," "game-changer," "cutting-edge," "at the end of the day," "in today's world," "it goes without saying," "I cannot stress enough," "pivotal," "landscape" (as a metaphor for a field). Replace each with a plain, direct phrase or delete.
+28. Cut trailing restatements. If the last sentence of a paragraph rephrases the first without adding new information, delete it.
+29. Compress multi-word prepositions: "in the event that" → "if," "with the exception of" → "except," "for the purpose of" → "to," "in the process of" → "while," "on the basis of" → "because."
+═══ PHASE E — GRAMMAR AND MECHANICS ═══
+30. Fix subject-verb agreement. Collective nouns ("the team," "the data") take singular verbs in standard American English unless the author is writing in British style — detect and apply consistently throughout.
+31. Fix comma splices. Join with a semicolon, split into two sentences, or add a coordinating conjunction — whichever preserves the rhythm the author intended.
+32. Fix dangling modifiers. The subject of the opening modifier must be the subject of the main clause. "Running the tests, the server crashed" → "Running the tests, we crashed the server" or restructure entirely.
+33. Fix apostrophe errors: possessives vs. plurals, it's vs. its.
+34. Enforce parallel structure in lists and paired constructions. Every item in a bullet list must open with the same grammatical form (all nouns, all imperatives, all gerunds — not a mix). A list that starts with "Running tests," then has "Error handling," then has "To deploy" is broken.
+35. Standardize number style: spell out one through nine, use numerals for 10 and above, unless the document is already following a different consistent convention — detect the convention first, then enforce it uniformly.
+36. Check hyphenation of compound modifiers before a noun ("high-quality output," "well-known author") vs. after a linking verb (no hyphen: "the output is high quality").
+═══ PHASE F — RHYTHM AND VOICE RESTORATION ═══
+37. Simulate reading the edited document aloud sentence by sentence. Flag any sequence of four or more sentences at the same approximate length (all short, all long) — vary one. The most common pattern to catch: after heavy editing, a passage turns into a string of 12-15 word declarative sentences, losing all urgency.
+38. Restore any author-signature constructions that your edits accidentally smoothed away. If the author favors short punchy closes, reinstate them where you replaced them with something longer. If they favor a colon-driven build ("One rule: state it first"), do not replace colons with commas.
+39. Check that the first and last sentence of each section are the strongest in that section. Readers remember beginnings and endings; bury the weak connective tissue in the middle.
+40. Verify register consistency throughout. Register-mixing is the most common editorial mistake after comma splices. Signals to look for: a formal argument section that suddenly uses a contraction or an exclamation mark; a conversational piece that shifts into passive bureaucratic voice for one paragraph; a technical doc that drops in a metaphor in a place where it creates ambiguity about whether the term is literal or figurative. Every sentence you touched must be in the same register as the untouched sentences around it.
+═══ PHASE G — CHANGE LOG ═══
+41. After all edits are applied, append a CHANGE LOG section at the end of the file (or write it to `<filename>.edit-log.md` if the document is published and cannot carry an appendix).
+42. Group entries by phase: STRUCTURE, CLARITY, CONCISION, GRAMMAR, RHYTHM.
+43. Format each entry: `[PHASE] Location (paragraph/heading/line) | What changed | Why.`
+44. Flag any [AMBIGUOUS] or [DELETED — CONFIRM] items prominently at the top of the log so the author can review them before accepting anything else.
+45. State the net word count delta (e.g., "−214 words, −18%"). This is the single clearest signal of whether the pass was editorial or transformative. A pass that removes more than 25% of words without a structural reorder is probably deleting ideas, not filler.
+KEY PRINCIPLE: A great edit is invisible — the author reads the result and thinks "yes, that's exactly what I meant," not "someone rewrote my work."

package/skills/engineering-calc.md ADDED Viewed

@@ -0,0 +1,101 @@
+---
+name: engineering-calc
+description: Perform a rigorous engineering calculation (structural load/stress/buckling, member sizing, thermal analysis, pressure vessel, circuit power/impedance, fluid flow) — states givens with SI/imperial units, cites governing equation + its derivation assumptions and validity limits, solves step-by-step with dimensional tracking, applies code-required safety factor, bounds-checks against first-principles intuition, and flags where a licensed PE stamp is legally required before use.
+allowed-tools: Read, Bash, Write
+---
+═══ HARD RULES ═══
+1. UNITS ARE NON-NEGOTIABLE. Every intermediate result carries its unit. Never drop units mid-derivation. Catch dimensional mismatch before substituting numbers.
+2. STATE ALL ASSUMPTIONS BEFORE SOLVING. An unstated assumption is a hidden error. Each assumption must be falsifiable (e.g., "assume plane stress — valid only when t < 0.1×min(a,b)").
+3. NEVER FABRICATE CODE CLAUSES OR MATERIAL PROPERTIES. If you do not have a verified value, write "LOOKUP REQUIRED: [source, table, edition]" and halt until the owner supplies it.
+4. SAFETY FACTORS COME FROM THE GOVERNING CODE, NOT FROM MEMORY. Cite the exact code edition, section, and table (e.g., "AISC 360-22 §B3.3, LRFD φ = 0.90").
+5. FLAG STAMP REQUIREMENT EXPLICITLY. Any result used in construction documents, a permit set, pressure vessel MAWP determination, or life-safety system MUST carry the statement: "⚠️ LICENSED PE REVIEW AND STAMP REQUIRED BEFORE FIELD USE." Note that stamp requirements vary by jurisdiction and project type — confirm with the authority having jurisdiction (AHJ).
+6. NEVER PRESENT A SINGLE NUMBER AS THE ANSWER. Always bound it: show the nominal result, the code-reduced design value, and a sanity-check comparison.
+7. STAY IN THE PHYSICS LANE. Reject scope creep — if the problem requires a test, a simulation convergence study, or on-site measurement, say so explicitly rather than substituting calculation.
+═══ PHASE A — PROBLEM INTAKE AND SCOPING ═══
+A1. Identify the calculation type precisely: (structural: axial/bending/shear/buckling/connection) | (thermal: steady-state/transient/fin/heat-exchanger) | (pressure: thin-wall/thick-wall/ASME VIII) | (fluid: head loss/pump sizing/orifice) | (electrical: DC/AC power, RC/RL/LC impedance, fault current).
+A2. List EVERY given quantity with value, unit, and provenance (measured / assumed / code-prescribed / datasheet).
+     Example format: P = 45 kN [applied factored load, owner-specified], L_span = 3.2 m [span, as-built drawing], Fy = 345 MPa [A992 steel, ASTM A992-22 Table 1].
+     Note: use distinct symbols for each physical quantity to avoid collision (e.g., L_span for beam span, L_wall for wall thickness in thermal problems).
+A3. Identify the UNKNOWNS (one primary, any secondary checks).
+A4. State the applicable design standard(s) and edition: e.g., AISC 360-22 LRFD; ASCE 7-22 load combos; ASME B31.3-2022; NEC 2023 Art. 210; IEC 60287.
+A5. Confirm scope: Is this a code-check of an existing member, a new sizing/selection, or a capacity verification? Each has a different workflow branch.
+═══ PHASE B — GOVERNING EQUATION AND VALIDITY CHECK ═══
+B1. Write the governing equation in symbolic form with every symbol defined.
+     Example: σ_b = M·c / I  [flexural stress, Euler-Bernoulli beam theory]
+B2. State the derivation assumptions and the conditions under which the equation is valid:
+     — Euler-Bernoulli: valid when L_span/d > 10; shear deformation negligible; linear elastic material.
+     — Thin-wall pressure: valid when t/r < 0.1 (hoop stress σ_h = P·r/t).
+     — Pipe flow regime: Re < 2300 fully laminar; Re > 4000 fully turbulent (use Moody chart or Colebrook equation); 2300 ≤ Re ≤ 4000 is the INDETERMINATE transition zone — do not apply either friction-factor model; flag for physical test or conservative bound.
+B3. Check whether the current problem lies INSIDE the validity range. If it does not, switch to the applicable equation (e.g., Timoshenko beam for deep beams where L_span/d ≤ 10; Lamé equations for thick-wall vessels where t/r ≥ 0.1).
+B4. Identify which load combination from the governing code applies (e.g., ASCE 7-22 §2.3.1: 1.2D + 1.6L + 0.5Lr for LRFD; or ASD: D + L).
+═══ PHASE C — STEP-BY-STEP DIMENSIONAL SOLUTION ═══
+C1. Substitute numerical values WITH units at each step. Show the unit algebra explicitly.
+     Example: M = (1.2×20 kN/m + 1.6×15 kN/m) × (6 m)² / 8
+                  = (24 + 24) kN/m × 36 m² / 8
+                  = 48 kN/m × 4.5 m²
+                  = 216 kN·m   ← units: (kN/m)(m²) = kN·m ✓
+C2. For iterative sizing (beam selection, wire gauge, pipe schedule): state the trial, compute demand vs. capacity, adjust, and iterate until φ·Rn ≥ Ru (LRFD) or Rn/Ω ≥ Ra (ASD).
+C3. For thermal problems: compute thermal resistance network using distinct symbols (R_cond = t_wall/(k·A) where t_wall is wall thickness [m], k is thermal conductivity [W/(m·K)], A is cross-sectional area [m²]; R_conv = 1/(h·A) where h is convective film coefficient [W/(m²·K)]); solve for ΔT; verify against material T_max limits from the applicable material spec.
+C4. For electrical: track impedance in complex form (Z = R + jωL − j/(ωC)); compute magnitude and phase; check against voltage drop limits (NEC 2023 Art. 210: ≤3% branch, ≤5% total feeders+branch) or fault-current ratings per SCCR nameplate.
+C5. Record every intermediate value as a labeled line — no silent accumulation of rounding errors.
+═══ PHASE D — CODE SAFETY FACTOR AND DESIGN CAPACITY ═══
+D1. Look up (or request) the code-prescribed resistance/safety factor for the failure mode being checked. Document source exactly.
+     Structural LRFD examples: φ_c = 0.90 (compression), φ_b = 0.90 (flexure), φ_v = 1.00 (shear AISC 360-22 §G2.1(a)).
+     Pressure vessel: ASME VIII Div.1 UG-23: joint efficiency E; allowable stress S from Table 1A.
+     Electrical: NEC derating factors (ambient temp per Table 310.15(B)(1), conduit fill per Table C.1, continuous loads at 80% per Art. 210.20(A)).
+D2. Compute the code-reduced design capacity:
+     φ·Rn = [calculated] — compare to demand Ru. State PASS / FAIL clearly.
+D3. Compute the utilization ratio: UR = Ru / (φ·Rn). A UR > 0.95 is a rule-of-thumb flag for high-utilization warranting re-evaluation or upsizing — it is NOT a code limit; the governing code pass/fail threshold is UR ≤ 1.00.
+═══ PHASE E — BOUNDS CHECK AND INTUITION GATE ═══
+E1. Order-of-magnitude check: estimate the answer by a different independent method using known physical benchmarks:
+     — Steel beam depth rule-of-thumb: L_span/20 to L_span/24 for floor beams under typical office loading.
+     — Water pressure: 9.81 kPa per metre of head (≈10 kPa/m); equivalently 0.433 psi/ft. Do NOT use 1 kPa/m — that is a 10× error.
+     — Wire ampacity: approximate starting point ~100 A for 1 AWG Cu (75°C column, NEC Table 310.12); always verify derating per D1.
+     — Heat flux in forced-air cooling: 50–200 W/m² is typical; >1000 W/m² requires liquid cooling in most configurations.
+E2. Compare to the calculated result. If they disagree by more than a factor of 2, STOP — diagnose the discrepancy before proceeding.
+E3. Physical-limit check: stress < Fy (elastic), T_surface < T_melting, power ≤ rated capacity, pressure ≤ MAWP. Explicitly confirm each.
+E4. Sign and direction check: deflection downward (positive per convention?), heat flows hot→cold, current flows source→load. State direction explicitly.
+E5. Sensitivity note: identify which input variable, if wrong by ±10%, would flip the PASS/FAIL outcome. Flag it for field verification.
+═══ PHASE F — DELIVERABLE AND STAMP FLAG ═══
+F1. Produce a clean calculation summary block:
+     ┌────────────────────────────────────────────────────┐
+     │ Calculation:       [type and element ID]           │
+     │ Governing code:    [code, edition, sections]       │
+     │ Demand (Ru/Su):    [value + unit]                  │
+     │ Capacity (φRn):    [value + unit]                  │
+     │ Utilization ratio: [UR = ##.#%]                    │
+     │ Result:            PASS / FAIL                     │
+     │ Key assumptions:   [numbered list]                 │
+     │ Lookups still needed: [or "None"]                  │
+     └────────────────────────────────────────────────────┘
+F2. List every quantity that was ASSUMED (not sourced): these are open items requiring owner/engineer confirmation.
+F3. Stamp flag — apply this block whenever the output will touch construction documents, permit drawings, pressure system MAWP, or life-safety:
+     ⚠️ LICENSED PE REVIEW AND STAMP REQUIRED BEFORE FIELD USE.
+     This calculation is a computational aid. It does not substitute for engineering judgment, site-specific conditions, or the legal responsibility of the engineer of record. PE stamp requirements vary by jurisdiction (state board rules) and project type — confirm with the AHJ and the engineer of record.
+F4. If the calculation reveals an overstress or FAIL condition: do NOT silently resize and re-present. Report the failure mode, the deficit, and candidate remediation options; let the engineer of record choose.
+═══ ANTI-PATTERNS TO REJECT ═══
+- Rounding intermediate values to 2 sig-figs before the final step (propagates error).
+- Using "typical" material properties without citing a spec or datasheet.
+- Applying structural steel resistance factors to aluminum or cold-formed steel without switching to the applicable code: ADM 2020 (aluminum), AISI S100-16 (cold-formed steel), AISC Design Guide 27 / SEI (stainless steel — no "AISC 370" exists; do not invent code designations).
+- Conflating nominal loads with factored loads — always label which combination is in play.
+- Presenting deflection without checking both strength AND serviceability limits (L/360 live, L/240 total per IBC 1604.3).
+- Quoting a code clause from memory without citing the edition year — code clauses change between editions and incorrect citation is a liability.
+- Applying pipe-flow friction factors inside the transition zone (2300 ≤ Re ≤ 4000) — the flow regime is indeterminate; treat it as a flag, not a solvable case.
+KEY PRINCIPLE: An engineering calculation is only as trustworthy as its weakest stated assumption — make every assumption visible, every unit explicit, every safety factor code-sourced, and every rule-of-thumb labeled as such.

package/skills/estimate.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+name: estimate
+description: Principled Fermi estimation — decompose into estimable factors with ranges, propagate uncertainty, sanity-check against anchors, tighten the dominant factor, report a range not false precision.
+allowed-tools: Read, Write
+---
+## PHASE 0 — FRAME THE QUESTION (do this before any arithmetic)
+1. Restate what you are estimating in one sentence. Include the exact quantity and units (e.g., "total GPU-hours per month", "annual revenue in USD", "bytes of storage per user").
+2. State the required precision: order-of-magnitude (10x tolerance), engineering (±2x), or tight (±20%). This governs how hard you must tighten each factor.
+3. State the reference time, geography, and scope. Ambiguity here poisons every downstream number.
+4. HARD RULE: never begin multiplying until steps 1–3 are written down.
+## PHASE 1 — DECOMPOSE
+5. Break the target quantity into a product or sum of factors such that each factor is easier to estimate than the whole. Prefer multiplicative chains for quantities with natural rate structure (volume = rate × time × density).
+6. Write the decomposition symbolically first: e.g., Q = A × B × C / D.
+7. Aim for 3–7 factors. Fewer = under-decomposed (one hidden assumption carries everything). More = false precision factory.
+8. Each factor must be estimable from memory, a quick lookup, or a calibrated prior. If a factor requires the same research as the original question, decompose it further.
+## PHASE 2 — ESTIMATE EACH FACTOR WITH A RANGE
+9. For every factor, give three numbers: LOW / EXPECTED / HIGH. These are not error bars — they are the range you would bet 80% of your credence covers the true value.
+10. Source or label each factor explicitly. Use one of: [memory], [lookup: <source>], [physics], [analogy: <comparable>], [guess]. Never hide the epistemic status.
+11. For human-count factors (users, workers, events): anchor to a population you know, then apply a fraction. State both.
+12. For rate factors (requests/sec, churn/month): anchor to a comparable product or process you have data on.
+13. For physical factors (bandwidth, energy, latency): derive from first principles or physics when possible — these are your hardest anchors.
+14. HARD RULE: do not research a factor for more than 2 minutes. If you cannot bound it in 2 minutes, flag it as [HIGH UNCERTAINTY] and assign a wide range (10x low-to-high).
+## PHASE 3 — PROPAGATE UNCERTAINTY
+15. Multiply EXPECTED values to get the point estimate. Write it down.
+16. For a multiplicative chain, propagate ranges by multiplying lows together and highs together: RANGE = (A_low × B_low × …) to (A_high × B_high × …).
+17. If the chain is long (>4 factors), use log-space addition: log(Q_high) = Σ log(factor_high). This prevents catastrophic range inflation from independent errors.
+18. For additive decompositions, propagate with RSS if factors are independent: σ_total = sqrt(Σ σ_i²). If factors are correlated (e.g., both driven by market size), propagate linearly.
+19. Report the propagated range. If it spans more than 100x, stop and return to Phase 1 — the decomposition has a hidden assumption carrying too much weight.
+## PHASE 4 — SANITY-CHECK AGAINST ANCHORS
+20. Pick at least TWO independent anchors and check the estimate against each:
+    - KNOWN COMPARABLE: a similar system, company, or physical quantity whose value you trust. Does your estimate agree within 3x?
+    - UPPER/LOWER BOUND: a physical or logical limit the answer cannot exceed (e.g., cannot exceed total world market; cannot be less than zero; cannot exceed speed of light). Is your estimate inside the bounds?
+    - ALTERNATIVE DECOMPOSITION: decompose the same quantity a different way (e.g., top-down market share vs. bottom-up unit economics). Do both approaches agree within the propagated range?
+21. If an anchor disagrees by more than the propagated range, treat it as a signal of a wrong assumption, not noise. Revisit Phase 1.
+22. HARD RULE: an estimate with zero sanity checks is a guess. Always run at least one.
+## PHASE 5 — SENSITIVITY ANALYSIS
+23. Identify the dominant factor: the one whose uncertainty interval contributes the most to the output range. For a multiplicative chain this is the factor with the highest log(high/low) ratio.
+24. For the dominant factor only: spend additional effort to tighten it. Look it up, run a quick calculation, or find a better analogy.
+25. After tightening, re-propagate and confirm the range contracted. If it did not, you have a second dominant factor — tighten that too.
+26. State explicitly: "The estimate is most sensitive to [factor]. A 2x error there moves the answer by 2x."
+## PHASE 6 — FINAL REPORT
+27. State: ESTIMATE = [EXPECTED VALUE] [UNITS], RANGE = [LOW] to [HIGH] [UNITS], CONFIDENCE = [X]% that the true value lies in the range.
+28. List every assumption made, numbered. Distinguish: [KNOWN] (measured or well-sourced) vs [ASSUMED] (judgment call).
+29. State what it would take to tighten the estimate by another 2x if the requester needs higher precision.
+30. HARD RULE: never present a single point estimate without a range. False precision is worse than an honest wide range.
+31. HARD RULE: if your range is [LOW, HIGH] and LOW × 10 < HIGH, label the confidence "order-of-magnitude only" and say so plainly.
+## HARD RULES SUMMARY (check before submitting)
+R1. Units on every number. Always.
+R2. Every factor has a source label: [memory], [lookup], [physics], [analogy], [guess].
+R3. At least one sanity-check anchor, stated explicitly.
+R4. Dominant factor identified and named.
+R5. Final answer is a range, not a point.
+R6. Assumptions are enumerated and distinguished as known vs assumed.
+R7. Required precision stated upfront; conclusion must declare whether it was met.

package/skills/experiment-design.md ADDED Viewed

@@ -0,0 +1,105 @@
+---
+name: experiment-design
+description: Design a rigorous, falsifiable experiment — hypothesis + null, controls for confounds, power analysis, pre-registered analysis plan, and validity threat mitigation.
+allowed-tools: Read, Write, WebSearch, Grep
+---
+You are a senior experimentalist. When the user describes a research question, produce a complete, rigorous experiment design document. Follow every phase below in order. Do not skip steps. State every assumption explicitly — unstated assumptions are protocol violations.
+## PHASE 1 — SHARPEN THE HYPOTHESIS
+1. Restate the research question as one declarative sentence.
+2. Write the **directional hypothesis (H1)**: "Manipulating [IV] causes a change in [DV] in [population] under [conditions]."
+3. Write the **null hypothesis (H0)**: the precise no-effect statement H1 must beat.
+4. Write the **falsification criterion**: the exact outcome (value, direction, threshold) that would force rejection of H1. If you cannot write this, the hypothesis is not falsifiable — stop and revise.
+5. Identify the **smallest effect size worth detecting** (SESOI / minimum detectable effect). Never let power analysis choose this for you retroactively.
+## PHASE 2 — CHOOSE THE DESIGN
+6. Select design type and justify: RCT, quasi-experiment, observational, within-subjects, between-subjects, mixed, factorial (2×2, etc.), crossover, sequential/adaptive. State why alternatives were rejected.
+7. If random assignment is impossible, name the identification strategy (DiD, RD, IV, matching, synthetic control) and its key assumption.
+8. Decide number of arms/conditions. Every arm must map to a scientific question — no orphan arms.
+9. State the unit of randomization (individual, cluster, time period) and whether it matches the unit of analysis. Mismatch = clustered standard errors or mixed model required.
+## PHASE 3 — VARIABLES + OPERATIONALIZATION
+10. **Independent variable(s)**: define the manipulation precisely (dose, timing, delivery). State the manipulation check — how do you verify the IV was actually delivered as intended?
+11. **Dependent variable(s)**: define the primary outcome first (one only). Secondary outcomes listed separately, analysis plan written before data collection.
+12. **Operationalization**: for each DV, specify the measurement instrument, scale (nominal/ordinal/interval/ratio), collection timing, and who collects it.
+13. **Measurement validity**: face validity, convergent validity (what correlated measure confirms it?), discriminant validity (what should NOT correlate?). For novel instruments, require pilot validation.
+14. **Reliability**: inter-rater reliability (Cohen's κ or ICC target ≥ 0.80), test-retest if applicable.
+## PHASE 4 — CONFOUND IDENTIFICATION + CONTROL
+15. List all plausible confounds (variables correlated with both IV and DV). Use a DAG (directed acyclic graph) if the causal structure is complex.
+16. For each confound, state the control strategy:
+    - **Randomization**: eliminates confounds in expectation (gold standard).
+    - **Blocking/stratification**: ensure balance on known high-variance covariates before randomization.
+    - **Blinding**: single-blind (participants), double-blind (assessors), or triple-blind (analysts). Specify who is blinded to what.
+    - **Counterbalancing**: for within-subjects designs, use Latin square or full counterbalancing to neutralize order/carryover effects.
+    - **Statistical control**: include as covariate in ANCOVA only if measured pre-treatment (post-treatment covariates can introduce collider bias).
+17. Identify **demand characteristics** and **experimenter expectancy bias** risks. Mitigate with scripted protocols, automated delivery, or blinded assessors.
+18. Name the **exclusion/inclusion criteria** for participants. Pre-specify; post-hoc exclusions require independent justification.
+## PHASE 5 — POWER ANALYSIS
+19. State the assumed effect size (Cohen's d, f, η², OR, etc.) and its source (prior literature, pilot, SESOI from Phase 1). Cite the source.
+20. Set α (default 0.05, justify if different), desired power 1−β (default 0.80; prefer 0.90 for costly interventions), and tails (one vs two).
+21. Compute required N per arm. Show the formula or tool used (G*Power, pwr package, simulation).
+22. Apply an **attrition correction**: inflate N by expected dropout rate (minimum 10% assumed unless justified).
+23. State whether interim analyses are planned. If yes, apply a stopping rule (O'Brien-Fleming, Pocock) and adjust α accordingly. No unplanned peeks.
+## PHASE 6 — PRE-REGISTRATION
+24. Before any data collection: register the following on OSF, AsPredicted, or equivalent:
+    - H0 and H1 verbatim
+    - Primary DV and analysis
+    - Sample size and stopping rule
+    - Exclusion criteria
+    - Covariates included in the model
+    - All secondary outcomes and their analysis order
+25. Lock the registration. Any post-registration deviation is a **registered report amendment** — must be logged with a timestamp and rationale, never silent.
+26. Separate **confirmatory** (pre-registered, controls α) from **exploratory** (post-hoc, hypothesis-generating only) analyses in all reports. Label them visibly.
+## PHASE 7 — CONTROL + COMPARISON GROUPS
+27. Define the control condition precisely. "No treatment" vs "active placebo" vs "treatment as usual" have different interpretations — choose deliberately.
+28. Ensure the control group differs from the treatment group on the IV only. Every other procedure must be identical (same attention, same measurement schedule).
+29. For factorial designs, verify that every cell is theoretically motivated. Underpowered interaction tests are a common failure mode — power the interaction, not just the main effects.
+## PHASE 8 — VALIDITY THREATS
+30. **Internal validity** (does the IV cause the DV change?): assess and mitigate history, maturation, regression to mean, selection bias, attrition/differential dropout, instrumentation drift, contamination between arms.
+31. **External validity** (do results generalize?): specify the target population, the sampled population, and the gap. State ecological validity of the lab/platform setting vs the real deployment context.
+32. **Construct validity** (are you measuring what you think?): mono-operation bias (use ≥2 operationalizations of key constructs), mono-method bias (triangulate measurement methods), hypothesis-guessing by participants.
+33. **Statistical conclusion validity**: adequate power (Phase 5), correct test assumptions met (normality, homoscedasticity — test them), no fishing across multiple DVs without correction.
+## PHASE 9 — EXACT ANALYSIS PLAN
+34. State the primary statistical test (t-test, ANOVA, regression, mixed-effects, Mann-Whitney, etc.) and why it fits the data type and design.
+35. State every assumption of that test and how each will be verified (Shapiro-Wilk, Levene's, Durbin-Watson, etc.).
+36. **Multiple comparisons**: if testing >1 outcome or >2 groups, apply Bonferroni, Holm-Bonferroni, Benjamini-Hochberg (FDR), or family-wise correction. State which and why.
+37. Specify the **effect size estimate** to report alongside p-values (d, η², ω², OR + CI). p-values alone are insufficient.
+38. State whether a **Bayesian alternative** is planned (BF₁₀ threshold for evidence) to distinguish "no effect" from "insufficient power."
+39. Define the **analysis dataset**: ITT (intent-to-treat, preferred for RCTs), per-protocol, or modified ITT. Pre-specify how missing data are handled (complete case, multiple imputation, MCAR/MAR assumption).
+## PHASE 10 — CONFIRMATION vs FALSIFICATION
+40. Write the **confirmation rule**: "H1 is supported if [test statistic] exceeds [threshold] with [direction] and effect size ≥ SESOI."
+41. Write the **falsification rule**: "H1 is falsified if [test] fails to reach threshold AND 90% CI excludes SESOI." (Absence of evidence ≠ evidence of absence without equivalence test.)
+42. Pre-commit to what you will conclude in each scenario: confirm, falsify, inconclusive. Inconclusive requires a plan (bigger sample, refined IV, new design).
+## PHASE 11 — ETHICS + FEASIBILITY
+43. **Ethics**: informed consent procedure, deception (if any — requires full debriefing), data privacy (anonymization, storage, retention), IRB/ethics board approval required before data collection.
+44. **Feasibility check**: can you recruit N participants in the timeline? Is the IV deliverable at scale? Are the DVs measurable at the required precision within budget?
+45. **Pilot study**: for any novel manipulation or instrument, run a pilot (10–15% of target N) to validate logistics, check effect size assumptions, and refine protocol. Pilot data excluded from confirmatory analysis.
+## HARD RULES (violations invalidate the design)
+- RULE 1: No hypothesis may be written after seeing the data.
+- RULE 2: No exclusions, covariate additions, or outcome re-definitions after data collection without a timestamped amendment.
+- RULE 3: Effect size assumptions must be justified from prior work or SESOI, never back-calculated from desired N.
+- RULE 4: Every deviation from the pre-registered plan must be disclosed in the results section, not buried in supplementary material.
+- RULE 5: If the design cannot state what result would falsify H1, redesign before proceeding.
+- RULE 6: Report all pre-registered outcomes, not just significant ones. Selective reporting is scientific misconduct.