npm - @pharaoh-so/mcp - Versions diffs - 0.1.6 → 0.2.0 - Mend

@pharaoh-so/mcp 0.1.6 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

package/CHANGELOG.md +41 -0
package/LICENSE +21 -0
package/README.md +237 -13
package/dist/helpers.js +1 -1
package/dist/index.js +6 -0
package/dist/install-skills.d.ts +33 -0
package/dist/install-skills.js +121 -0
package/inspect-tools.json +12 -2
package/package.json +64 -32
package/skills/.gitkeep +0 -0
package/skills/pharaoh/SKILL.md +81 -0
package/skills/pharaoh-audit-tests/SKILL.md +88 -0
package/skills/pharaoh-brainstorm/SKILL.md +73 -0
package/skills/pharaoh-debt/SKILL.md +33 -0
package/skills/pharaoh-debug/SKILL.md +69 -0
package/skills/pharaoh-execute/SKILL.md +57 -0
package/skills/pharaoh-explore/SKILL.md +32 -0
package/skills/pharaoh-finish/SKILL.md +79 -0
package/skills/pharaoh-health/SKILL.md +36 -0
package/skills/pharaoh-investigate/SKILL.md +34 -0
package/skills/pharaoh-onboard/SKILL.md +32 -0
package/skills/pharaoh-parallel/SKILL.md +74 -0
package/skills/pharaoh-plan/SKILL.md +74 -0
package/skills/pharaoh-pr/SKILL.md +52 -0
package/skills/pharaoh-refactor/SKILL.md +36 -0
package/skills/pharaoh-review/SKILL.md +61 -0
package/skills/pharaoh-review-codex/SKILL.md +80 -0
package/skills/pharaoh-review-receive/SKILL.md +81 -0
package/skills/pharaoh-sessions/SKILL.md +85 -0
package/skills/pharaoh-tdd/SKILL.md +104 -0
package/skills/pharaoh-verify/SKILL.md +72 -0
package/skills/pharaoh-wiring/SKILL.md +34 -0
package/skills/pharaoh-worktree/SKILL.md +85 -0
package/dist/auth.js.map +0 -1
package/dist/credentials.js.map +0 -1
package/dist/index.js.map +0 -1
package/dist/proxy.js.map +0 -1

package/skills/pharaoh-parallel/SKILL.md ADDED Viewed

@@ -0,0 +1,74 @@
+---
+name: pharaoh-parallel
+description: "Dispatch 2+ independent subagent tasks that run concurrently. Each agent gets focused scope, clear goal, constraints, and expected output. No shared state between agents. Review and integrate results after all complete."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["parallel", "subagents", "concurrency", "delegation", "efficiency"]}
+---
+# Parallel Dispatch
+Delegate independent tasks to specialized agents running concurrently. Each agent gets isolated context and focused scope. Never share your session history — construct exactly what each agent needs.
+## When to Use
+- 2+ independent tasks with no shared state
+- Multiple failures across different subsystems
+- Each problem can be understood without context from others
+- Agents won't edit the same files
+## Do Not Use When
+- Failures are related (fixing one might fix others)
+- Tasks require understanding full system state
+- Agents would edit the same files or resources
+- You don't yet know what's broken (investigate first)
+## The Pattern
+### 1. Identify Independent Domains
+Group work by what's independent:
+- Different test files with different root causes
+- Different modules with unrelated issues
+- Different features with no shared code
+### 2. Craft Agent Prompts
+Each agent gets:
+- **Specific scope:** one file, one module, one subsystem
+- **Clear goal:** what success looks like
+- **Constraints:** what NOT to change
+- **Context:** error messages, relevant code paths, architectural notes
+- **Expected output:** summary of findings and changes
+### 3. Dispatch
+Launch all agents simultaneously. They run concurrently with no coordination needed.
+### 4. Review and Integrate
+When agents return:
+1. Read each summary — understand what changed
+2. Check for conflicts — did agents edit overlapping code?
+3. Run full test suite — verify all fixes work together
+4. Spot-check results — agents can make systematic errors
+## Prompt Quality
+| Bad | Good |
+|-----|------|
+| "Fix all the tests" | "Fix the 3 failures in agent-abort.test.ts" |
+| "Fix the race condition" | "Fix timing in abort test — here are the error messages: ..." |
+| No constraints | "Do NOT change production code, fix tests only" |
+| "Fix it" | "Return: root cause summary + what you changed" |
+## Iron Rules
+- **One task per agent** — focused agents produce better results than broad ones
+- **No shared state** — agents must not depend on each other's output
+- **Never trust agent reports** — verify changes independently before integrating
+- **Construct context, don't inherit** — give each agent exactly what it needs, nothing more

package/skills/pharaoh-plan/SKILL.md ADDED Viewed

@@ -0,0 +1,74 @@
+---
+name: pharaoh-plan
+description: "Architecture-aware planning workflow using Pharaoh codebase knowledge graph. Four-phase process: reconnaissance with MCP tools, blast radius analysis, approach selection with trade-offs, and step-by-step implementation plan with wiring declarations. Prevents dead exports and overcoupled designs before a line of code is written."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["planning", "architecture", "blast-radius", "pharaoh", "implementation-plan", "wiring"]}
+---
+# Plan with Pharaoh
+Architecture-aware planning before implementation. Uses `plan-with-pharaoh` — a 4-phase workflow (+ adversarial review) that combines reconnaissance, blast radius analysis, approach trade-offs, and a wired step-by-step plan. Iron law: every new export must have a declared caller.
+## When to Use
+Invoke before implementing any non-trivial change: new features, refactors, adding modules, or anything that touches shared code. Use it whenever you need to answer "what's the right way to build this?" before writing code.
+## Workflow
+### Phase 1 — Reconnaissance (do NOT skip)
+1. Call `get_codebase_map` for the target repository to see the full module landscape.
+2. Call `get_module_context` on each module likely affected by the change.
+3. Call `search_functions` for terms related to the feature or change description.
+4. Call `query_dependencies` between the affected modules to map coupling.
+5. Call `get_blast_radius` on the primary target of the change.
+6. Call `check_reachability` on the primary target to verify it is reachable from entry points.
+### Phase 2 — Analysis
+Using the reconnaissance data:
+- Evaluate the blast radius — how many callers and modules are affected?
+- Check `search_functions` results — does related code already exist?
+- Assess module coupling — are the affected modules tightly or loosely coupled?
+- Rate the risk level (LOW / MEDIUM / HIGH) based on blast radius and coupling.
+### Phase 3 — Approach
+Propose 2-3 implementation approaches with trade-offs:
+- For each approach: what files change, estimated blast radius, pros, cons.
+- Recommend one approach with justification.
+- Flag any approach that would increase module coupling.
+### Phase 4 — Plan
+Produce a step-by-step implementation plan:
+- Exact files and functions to create or modify.
+- Blast radius per change (from Phase 1 data).
+- Required tests for each step.
+- Wiring declarations: every new export must have a declared caller.
+Iron law: "Every new export in the plan must have a declared caller. If a function has no caller, it's not part of the plan — remove it."
+### Phase 5 — Adversarial Review
+Before presenting the plan, adversarially review it:
+- Are all new exports connected to declared callers? (If not, remove them.)
+- Is the blast radius acceptable, or does the approach touch too many callers?
+- Does the approach minimize coupling, or does it introduce new cross-module dependencies?
+- Are there simpler alternatives that achieve the same result with fewer file changes?
+- Would any step create unreachable code paths?
+Only present the plan after it passes this review.
+## Output
+A complete implementation plan containing:
+- Risk rating (LOW / MEDIUM / HIGH) with data backing
+- Recommended approach with trade-off rationale
+- Numbered steps with exact files and functions
+- Blast radius per change
+- Required tests per step
+- Wiring declarations for every new export
+- Adversarial review findings (issues caught and resolved)

package/skills/pharaoh-pr/SKILL.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+name: pharaoh-pr
+description: "Pre-pull-request architectural review checklist using Pharaoh codebase knowledge graph. Covers module context, blast radius per touched module, hidden coupling between modules, duplicate logic detection, regression risk scoring, and vision spec alignment. Produces a structured review summary before opening a PR."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["pull-request", "code-review", "architecture", "pharaoh", "pre-pr", "regression-risk"]}
+---
+# Pre-PR Review
+Architectural review checklist to run before opening a pull request. Uses `pre-pr-review` — a 6-step workflow covering module context, blast radius, dependency coupling, duplicate logic, regression risk, and spec drift. Catches architectural problems before reviewers see the code.
+## When to Use
+Invoke before opening a pull request. Use it when changes touch one or more modules and you want a structured architectural assessment before requesting human review.
+## Workflow
+### Step 1: Module context
+For each touched module, call `get_module_context` to review its current structure and complexity.
+### Step 2: Blast radius
+For each touched module, call `get_blast_radius` to identify what else is affected by the changes.
+### Step 3: Dependency check
+Call `query_dependencies` between each pair of touched modules to find hidden coupling introduced by the PR.
+### Step 4: Consolidation check
+Call `get_consolidation_opportunities` for the target repository to flag any duplicate logic introduced by the PR.
+### Step 5: Regression risk
+Call `get_regression_risk` for the target repository to assess overall change risk.
+### Step 6: Vision alignment
+Call `get_vision_gaps` for the target repository to check if changes align with or drift from specs.
+## Output
+A review summary containing:
+- **Architecture impact:** modules affected, dependency changes introduced
+- **Risk assessment:** blast radius per module, overall regression risk level
+- **Cleanup opportunities:** consolidation candidates, unused code created
+- **Spec alignment:** vision gaps introduced or resolved by the PR
+Ready to paste into the PR description or share with reviewers.

package/skills/pharaoh-refactor/SKILL.md ADDED Viewed

@@ -0,0 +1,36 @@
+---
+name: pharaoh-refactor
+description: "Safe refactoring workflow using Pharaoh codebase knowledge graph. Six-step process: module context, blast radius of downstream callers, reachability verification, dependency mapping, naming conflict detection, and test coverage assessment. Produces a refactoring plan with every caller listed, test files identified, unreachable code flagged, and high-risk paths warned."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["refactoring", "blast-radius", "architecture", "pharaoh", "safe-refactor", "test-coverage"]}
+---
+# Safe Refactor
+Step-by-step workflow to safely refactor a function or module with full blast radius awareness. Uses `safe-refactor` — a 6-step process that maps every caller, identifies affected tests, flags unreachable code, and warns about high-risk downstream paths before a single line changes.
+## When to Use
+Invoke before refactoring any function, module, or file — especially shared utilities, exports used across multiple modules, or anything with an unclear caller graph. Use it whenever the blast radius of a change is unknown.
+## Workflow
+1. Call `get_module_context` for the module containing the target to understand its current structure.
+2. Call `get_blast_radius` for the target to identify all downstream callers and affected modules.
+3. Call `check_reachability` for the target to verify it is actually reachable from entry points.
+4. Call `query_dependencies` to map how the containing module connects to its dependents.
+5. Call `search_functions` to check if the refactored version's name already exists elsewhere.
+6. Call `get_test_coverage` for the module to identify which tests cover the refactored code.
+Do not propose a refactoring plan until all 6 steps are complete.
+## Output
+A refactoring plan containing:
+- **Callers to update:** every function and file that calls the target, with update requirements
+- **Tests to change:** test files covering the refactored code, with required modifications
+- **Dead code:** unreachable code paths that can be deleted during the refactor
+- **High-risk paths:** downstream modules with wide blast radius or high complexity scores
+- **Naming conflicts:** existing functions whose names would conflict with the refactored version

package/skills/pharaoh-review/SKILL.md ADDED Viewed

@@ -0,0 +1,61 @@
+---
+name: pharaoh-review
+description: "Architecture-aware pre-PR code review using Pharaoh codebase knowledge graph. Four-phase workflow: context gathering with module structure and blast radius, risk assessment with regression scoring and wiring checks, spec alignment against vision docs, and a final verdict of SHIP / SHIP WITH CHANGES / BLOCK. Auto-block rules for unreachable exports, circular dependencies, high regression risk, and spec violations."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["code-review", "pull-request", "architecture", "pharaoh", "regression-risk", "spec-alignment"]}
+---
+# Review with Pharaoh
+Architecture-aware pre-PR review. Uses `review-with-pharaoh` — a 4-phase workflow that assesses blast radius, regression risk, wiring integrity, duplication, and spec alignment. Produces a final verdict: SHIP, SHIP WITH CHANGES, or BLOCK.
+## When to Use
+Invoke before merging any pull request. Use it when reviewing changes that touch shared modules, export new functions, modify core data flows, or claim to implement a spec.
+## Workflow
+### Phase 1 — Context
+1. For each touched module, call `get_module_context` to understand its structure.
+2. For each touched module, call `get_blast_radius` to identify downstream impact.
+3. Call `query_dependencies` between the touched modules to map coupling.
+### Phase 2 — Risk Assessment
+4. Call `get_regression_risk` for the target repository to assess overall change risk.
+5. Call `check_reachability` for new exports in the touched modules — are they wired?
+6. Call `get_consolidation_opportunities` for the repository to check for duplicated logic.
+### Phase 3 — Spec Alignment
+7. Call `get_vision_gaps` for the repository to verify changes align with specs.
+### Phase 4 — Verdict
+Produce a review with:
+- **Architecture impact:** modules affected, dependency changes, blast radius
+- **Risk assessment:** regression risk level, volatile modules touched
+- **Wiring check:** are all new exports reachable from entry points?
+- **Duplication check:** does new code duplicate existing logic?
+- **Spec alignment:** do changes match or drift from vision specs?
+Final verdict: **SHIP** / **SHIP WITH CHANGES** / **BLOCK**
+Auto-block triggers (any of these = BLOCK):
+- Unreachable exports (new code with zero callers)
+- New circular dependencies between modules
+- HIGH regression risk without corresponding test coverage
+- Vision spec violations (building against spec intent)
+## Output
+A structured review containing:
+- Architecture impact summary with specific modules and blast radius numbers
+- Risk level (LOW / MEDIUM / HIGH) with data backing
+- Wiring status for all new exports
+- Duplication findings with affected modules
+- Spec alignment verdict
+- Final verdict (SHIP / SHIP WITH CHANGES / BLOCK) with specific required changes if not SHIP

package/skills/pharaoh-review-codex/SKILL.md ADDED Viewed

@@ -0,0 +1,80 @@
+---
+name: pharaoh-review-codex
+description: "Cross-model security review. Dispatch code to a different AI model or subagent for independent second-opinion review. Evaluator applies AGREE, DISAGREE, or CONTEXT verdicts to each finding. Catches blind spots from single-model reasoning. Use for security-sensitive code, auth flows, data access, and architectural decisions."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["security-review", "cross-model", "second-opinion", "code-review", "verification"]}
+---
+# Cross-Model Code Review
+Get a second opinion on critical code by dispatching it to an independent reviewer — a different agent, model, or subagent. One model's blind spots are another's obvious catches.
+## When to Use
+- Security-sensitive changes (auth, encryption, access control, token handling)
+- Data access patterns (tenant isolation, query construction, input validation)
+- Architectural decisions with long-term consequences
+- Code you're not fully confident about
+- Before shipping changes that affect user data or billing
+## Do Not Use When
+- Trivial changes (typos, formatting, dependency bumps)
+- Changes fully covered by existing tests with high mutation scores
+- Time-critical hotfixes where review delay is worse than risk
+## Process
+### 1. Prepare Review Package
+Assemble exactly what the reviewer needs:
+- **Changed files:** full diff or complete file contents
+- **Context:** what the code does, why it was changed, what it interacts with
+- **Constraints:** security requirements, isolation rules, performance bounds
+- **Specific concerns:** what you want the reviewer to focus on
+Do NOT send your session history — construct focused context.
+### 2. Dispatch to Reviewer
+Send the review package to an independent agent. The reviewer should have no knowledge of your reasoning process — they evaluate the code fresh.
+### 3. Reviewer Applies Verdicts
+For each finding, the reviewer assigns:
+| Verdict | Meaning |
+|---------|---------|
+| **AGREE** | Confirms the implementation is correct for the stated concern |
+| **DISAGREE** | Identifies a concrete issue with evidence |
+| **CONTEXT** | Cannot determine correctness — needs more information |
+Each DISAGREE must include: what's wrong, why it matters, and a suggested fix.
+### 4. Evaluate Findings
+When review returns:
+- **AGREE items:** no action needed
+- **DISAGREE items:** verify the finding against actual code. If confirmed, fix. If the reviewer misunderstood context, document why the current approach is correct.
+- **CONTEXT items:** provide the missing information and re-review that item
+## What to Include in Review
+| Category | Include | Skip |
+|----------|---------|------|
+| Auth/access control | Token validation, session management, permission checks | UI styling |
+| Data access | Query construction, tenant isolation, input sanitization | Logging format |
+| Cryptography | Key management, encryption/decryption, hashing | String formatting |
+| Error handling | What's exposed to users, what's logged, what's swallowed | Happy path only |
+## Key Principles
+- **Independent evaluation** — reviewer must not be primed with your conclusions
+- **Evidence-based verdicts** — no "looks fine" without specifics
+- **Verify disagreements** — reviewer may lack context; check before acting
+- **Don't skip uncomfortable findings** — the point is catching what you missed
+- **Repeat for high-stakes changes** — one review round may not be enough

package/skills/pharaoh-review-receive/SKILL.md ADDED Viewed

@@ -0,0 +1,81 @@
+---
+name: pharaoh-review-receive
+description: "Receive code review feedback with technical rigor. No performative agreement — verify suggestions against codebase reality before implementing. Push back with evidence when feedback is wrong. Clarify all unclear items before implementing any. External feedback is suggestions to evaluate, not orders to follow."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["code-review", "feedback", "technical-rigor", "pushback", "collaboration"]}
+---
+# Receiving Code Review
+Code review requires technical evaluation, not emotional performance.
+**Verify before implementing. Ask before assuming. Technical correctness over social comfort.**
+## When to Use
+When receiving code review feedback — from humans, external reviewers, or automated tools. Especially when feedback seems unclear or technically questionable.
+## The Response Pattern
+1. **READ:** complete feedback without reacting
+2. **UNDERSTAND:** restate the requirement in your own words (or ask)
+3. **VERIFY:** check against codebase reality
+4. **EVALUATE:** technically sound for THIS codebase?
+5. **RESPOND:** technical acknowledgment or reasoned pushback
+6. **IMPLEMENT:** one item at a time, test each
+## Forbidden Responses
+Never respond with:
+- "You're absolutely right!"
+- "Great point!" / "Excellent feedback!"
+- "Let me implement that now" (before verification)
+- Any gratitude expression
+Instead: restate the technical requirement, ask clarifying questions, push back with evidence if wrong, or just start fixing.
+## Handling Unclear Feedback
+If ANY item is unclear: **stop — do not implement anything yet.** Ask for clarification on unclear items before touching code. Items may be related; partial understanding produces wrong implementations.
+## Evaluating External Feedback
+Before implementing suggestions from external reviewers:
+1. Is this technically correct for THIS codebase?
+2. Does it break existing functionality?
+3. Is there a reason for the current implementation?
+4. Does the reviewer understand the full context?
+5. Does it conflict with prior architectural decisions?
+If a suggestion seems wrong, push back with technical reasoning — reference working tests, actual code, or codebase patterns.
+## When to Push Back
+- Suggestion breaks existing functionality
+- Reviewer lacks full context
+- Feature is unused (YAGNI)
+- Technically incorrect for this stack
+- Conflicts with established architectural decisions
+**How:** technical reasoning, specific questions, references to working code. Never defensive — just factual.
+## Implementation Order
+For multi-item feedback:
+1. Clarify anything unclear FIRST
+2. Implement in priority order: blocking issues, simple fixes, complex fixes
+3. Test each fix individually
+4. Verify no regressions
+## Acknowledging Correct Feedback
+When feedback IS correct:
+- "Fixed. [Brief description of what changed]"
+- "Good catch — [specific issue]. Fixed in [location]."
+- Or just fix it silently — the code shows you heard.
+Actions speak. No performative agreement needed.

package/skills/pharaoh-sessions/SKILL.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: pharaoh-sessions
+description: "Decompose work into parallel, isolated sessions using git worktrees. Each session gets fresh context, a narrow scope, and produces atomic commits. Prevents context window pollution from large tasks. Coordinate across sessions without shared state."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["sessions", "worktrees", "parallel-work", "context-management", "decomposition"]}
+---
+# Session Decomposition
+Break large tasks into parallel, isolated work sessions. Each session runs in its own git worktree with fresh context, focused scope, and atomic commits. Prevents context window bloat and keeps each unit of work clean.
+## When to Use
+- Task is too large for a single context window
+- Work has 3+ independent sub-tasks that don't touch the same files
+- You need to preserve context quality across a multi-hour effort
+- Multiple features or fixes can proceed in parallel
+## Do Not Use When
+- Sub-tasks share files or state
+- Work is sequential (each step depends on the previous)
+- Task fits comfortably in one session
+## Process
+### 1. Decompose
+Break the task into sessions. Each session must:
+- Have a clear, narrow goal (one feature, one fix, one module)
+- Touch a distinct set of files — no overlap between sessions
+- Be independently verifiable (tests pass, build succeeds)
+- Produce atomic commits that make sense on their own
+### 2. Create Worktrees
+For each session, create an isolated worktree:
+```bash
+git worktree add .worktrees/<session-name> -b <branch-name>
+```
+Install dependencies in each worktree. Verify clean baseline (tests pass).
+### 3. Write Session Prompts
+Each session gets a prompt containing:
+- **Goal:** what this session produces (1-2 sentences)
+- **Scope:** which files/modules to touch (explicit list)
+- **Constraints:** what NOT to change
+- **Verification:** how to confirm the work is correct
+- **Context:** any architectural decisions or patterns to follow
+### 4. Execute Sessions
+Run each session independently. Sessions should not reference each other's work-in-progress — they operate on the same base commit.
+### 5. Integrate
+After all sessions complete:
+1. Verify each branch independently (tests pass, build succeeds)
+2. Merge branches sequentially into the target branch
+3. Resolve any conflicts (rare if decomposition was clean)
+4. Run full verification on the integrated result
+## Decomposition Rules
+| Good decomposition | Bad decomposition |
+|---|---|
+| Session A: auth module, Session B: billing module | Session A: backend, Session B: frontend (likely share types) |
+| Session A: new feature, Session B: unrelated bugfix | Session A: write code, Session B: write tests (coupled) |
+| Session A: parser, Session B: renderer (clear interface) | Session A: first half of file, Session B: second half |
+## Key Principles
+- **No shared files** — if two sessions touch the same file, merge them into one
+- **Fresh context per session** — don't carry state between sessions
+- **Atomic commits** — each session's output should be a coherent, reviewable unit
+- **Verify before integrating** — never merge a session that doesn't pass its own checks
+- **Decomposition is the hard part** — spend time getting boundaries right before starting work

package/skills/pharaoh-tdd/SKILL.md ADDED Viewed

@@ -0,0 +1,104 @@
+---
+name: pharaoh-tdd
+description: "Test-driven development discipline. Write the failing test first, watch it fail, write minimal code to pass, refactor. No production code without a failing test. No exceptions without explicit permission. Covers red-green-refactor cycle, common rationalizations, and when to start over."
+version: 0.2.0
+homepage: https://pharaoh.so
+user-invocable: true
+metadata: {"emoji": "☥", "tags": ["tdd", "testing", "red-green-refactor", "quality", "discipline"]}
+---
+# Test-Driven Development
+Write the test first. Watch it fail. Write minimal code to pass. Refactor.
+**If you didn't watch the test fail, you don't know if it tests the right thing.**
+## When to Use
+Always — for new features, bug fixes, refactoring, and behavior changes. Exceptions require explicit user permission (throwaway prototypes, generated code, config files).
+## The Iron Law
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+Wrote code before the test? Delete it. Start over. Don't keep it as "reference." Don't adapt it. Delete means delete.
+## Red-Green-Refactor
+### RED — Write Failing Test
+Write one minimal test showing what should happen.
+- One behavior per test
+- Clear name describing the behavior
+- Real code, not mocks (unless unavoidable)
+Run the test. Confirm it **fails** (not errors) for the expected reason — the feature is missing, not a typo.
+### GREEN — Minimal Code
+Write the simplest code that makes the test pass. Nothing more.
+- Don't add features the test doesn't require
+- Don't refactor other code
+- Don't "improve" beyond the test
+Run the test. Confirm it passes. Confirm other tests still pass.
+### REFACTOR — Clean Up
+Only after green:
+- Remove duplication
+- Improve names
+- Extract helpers
+Keep tests green throughout. Don't add behavior during refactor.
+### Repeat
+Next failing test for next behavior.
+## Bug Fix Flow
+1. Write a failing test that reproduces the bug
+2. Watch it fail — confirms the test catches the bug
+3. Fix the bug with minimal code
+4. Watch it pass — confirms the fix works
+5. Refactor if needed
+Never fix bugs without a regression test.
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Need to explore first" | Fine. Throw away exploration, then start with TDD. |
+| "Test hard = skip test" | Hard to test = hard to use. Simplify the interface. |
+| "TDD will slow me down" | TDD is faster than debugging. Always. |
+| "Already manually tested" | Ad-hoc is not systematic. No record, can't re-run. |
+## Red Flags — Start Over
+- Code written before test
+- Test passes immediately (testing existing behavior)
+- Can't explain why test failed
+- "Just this once" rationalization
+- Keeping pre-TDD code "as reference"
+**All of these mean: delete code, start over with TDD.**
+## Verification Checklist
+Before marking work complete:
+- Every new function has a test
+- Watched each test fail before implementing
+- Each test failed for the expected reason
+- Wrote minimal code to pass each test
+- All tests pass with clean output
+- Tests use real code (mocks only if unavoidable)
+- Edge cases and errors are covered