npm - @pharaoh-so/mcp - Versions diffs - 0.2.10 → 0.2.11 - Mend

@pharaoh-so/mcp 0.2.10 → 0.2.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/package.json +1 -1
package/skills/plan/SKILL.md +276 -40

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@pharaoh-so/mcp",
   "mcpName": "so.pharaoh/pharaoh",
-  "version": "0.2.10",
+  "version": "0.2.11",
   "description": "MCP proxy for Pharaoh — maps codebases into queryable knowledge graphs for AI agents. Enables Claude Code in headless environments (VPS, SSH, CI) via device flow auth.",
   "type": "module",
   "main": "dist/index.js",

package/skills/plan/SKILL.md CHANGED Viewed

@@ -1,74 +1,310 @@
 ---
 name: plan
-description: "Architecture-aware planning workflow using Pharaoh codebase knowledge graph. Four-phase process: reconnaissance with MCP tools, blast radius analysis, approach selection with trade-offs, and step-by-step implementation plan with wiring declarations. Prevents dead exports and overcoupled designs before a line of code is written."
-version: 0.2.0
+description: "Full-cycle architecture-aware planning: Pharaoh reconnaissance, structured plan writing with bite-sized TDD steps and zero placeholders, then deep adversarial review with wiring verification and interactive issue resolution. Replaces both writing-plans and plan-review."
+version: 0.3.0
 homepage: https://pharaoh.so
 user-invocable: true
-metadata: {"emoji": "☥", "tags": ["planning", "architecture", "blast-radius", "pharaoh", "implementation-plan", "wiring"]}
+metadata: {"emoji": "☥", "tags": ["planning", "architecture", "blast-radius", "pharaoh", "implementation-plan", "wiring", "review", "tdd"]}
 ---
 # Plan with Pharaoh
-Architecture-aware planning before implementation. Uses `plan-with-pharaoh` — a 4-phase workflow (+ adversarial review) that combines reconnaissance, blast radius analysis, approach trade-offs, and a wired step-by-step plan. Iron law: every new export must have a declared caller.
+Full-cycle planning: reconnaissance → plan writing → adversarial review. Combines architecture-aware graph analysis with rigorous plan craft and interactive issue resolution.
+**You are in plan mode. Do NOT make any code changes. Think, evaluate, plan, review.**
 ## When to Use
-Invoke before implementing any non-trivial change: new features, refactors, adding modules, or anything that touches shared code. Use it whenever you need to answer "what's the right way to build this?" before writing code.
+Before implementing any non-trivial change: new features, refactors, adding modules, or anything that touches shared code. Use it whenever you need to answer "what's the right way to build this?" before writing code.
+## Project Overrides
+If a `.claude/plan-review.md` file exists in this project, read it now and apply those rules on top of this baseline. Project rules take precedence where they conflict.
+## Engineering Preferences (guide all recommendations)
+- DRY: flag repetition aggressively
+- Well-tested: too many tests > too few; mutation score > line coverage
+- "Engineered enough" — not fragile/hacky, not over-abstracted
+- Handle more edge cases, not fewer; thoughtfulness > speed
+- Explicit > clever; simple > complex
+- Subtraction > addition; target zero or negative net LOC
+- Every export must have a caller; unwired code doesn't exist
+## Document Review Mode
-## Workflow
+If the user provides a document, PRD, prompt, or artifact alongside this command, that IS the plan to review. Still run Phase 1 (Reconnaissance) — always verify against the actual codebase. Then proceed to Phase 3 (Approach) and Phase 5 (Review), applying all review sections to that document. Do not treat it as background context — it is the subject of evaluation.
-### Phase 1 — Reconnaissance (do NOT skip)
+---
+## Phase 1 — Reconnaissance (required — do this BEFORE anything else)
+Do NOT plan from memory or assumptions. Query the actual codebase first:
-1. Call `get_codebase_map` for the target repository to see the full module landscape.
-2. Call `get_module_context` on each module likely affected by the change.
-3. Call `search_functions` for terms related to the feature or change description.
-4. Call `query_dependencies` between the affected modules to map coupling.
-5. Call `get_blast_radius` on the primary target of the change.
-6. Call `check_reachability` on the primary target to verify it is reachable from entry points.
+1. `get_codebase_map` — current modules, hot files, dependency graph
+2. `search_functions` for keywords related to the plan — find existing code to reuse/extend
+3. `get_module_context` on each module likely affected by the change
+4. `query_dependencies` between affected modules — coupling, circular deps
+5. `get_blast_radius` on the primary target of the change
+6. `check_reachability` on the primary target to verify it's reachable from entry points
-### Phase 2 — Analysis
+Ground every recommendation in what actually exists. If you propose adding something, confirm it doesn't already exist. If you propose changing something, know its blast radius.
+## Phase 2 — Analysis
 Using the reconnaissance data:
 - Evaluate the blast radius — how many callers and modules are affected?
-- Check `search_functions` results — does related code already exist?
+- Check `search_functions` results — does related code already exist? Can you reuse/extend?
 - Assess module coupling — are the affected modules tightly or loosely coupled?
-- Rate the risk level (LOW / MEDIUM / HIGH) based on blast radius and coupling.
+- Rate the risk level (LOW / MEDIUM / HIGH) based on blast radius and coupling
+- Does this need new code at all, or can an existing pattern solve it?
+## Phase 3 — Approach
+### Scope Check
+If the spec covers multiple independent subsystems, it should be broken into separate plans — one per subsystem. Each plan should produce working, testable software on its own. Suggest splitting if needed.
+### Mode Selection
+Ask the user which mode before proceeding:
+- **BIG CHANGE**: Full plan with all sections, approach trade-offs, interactive review
+- **SMALL CHANGE**: Abbreviated plan, sections 2-4 of review only
+### Approach Trade-offs
+Propose 2-3 implementation approaches:
+- For each: what files change, estimated blast radius, pros, cons
+- Recommend one with justification
+- Flag any approach that would increase module coupling
+- Flag any approach that requires new code where existing code could be extended
+## Phase 4 — Plan Writing
+### File Structure
+Before defining tasks, map out which files will be created or modified and what each one is responsible for. This is where decomposition decisions get locked in.
+- Design units with clear boundaries and well-defined interfaces. Each file should have one clear responsibility.
+- Files that change together should live together. Split by responsibility, not by technical layer.
+- In existing codebases, follow established patterns. If the codebase uses large files, don't unilaterally restructure — but if a file you're modifying has grown unwieldy, including a split is reasonable.
+This structure informs the task decomposition. Each task should produce self-contained changes that make sense independently.
+### Plan Document Header
+Every plan MUST start with:
+```markdown
+# [Feature Name] Implementation Plan
+> **For agentic workers:** Use `pharaoh:orchestrate` (recommended) or `pharaoh:execute` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+**Goal:** [One sentence describing what this builds]
+**Architecture:** [2-3 sentences about approach]
+**Tech Stack:** [Key technologies/libraries]
+**Risk:** [LOW / MEDIUM / HIGH] — [one line justification from Phase 2 data]
+---
+```
+### Bite-Sized Task Granularity
+Each step is one action (2-5 minutes):
+- "Write the failing test" — step
+- "Run it to make sure it fails" — step
+- "Implement the minimal code to make the test pass" — step
+- "Run the tests and make sure they pass" — step
+- "Commit" — step
+### Task Structure
+````markdown
+### Task N: [Component Name]
+**Files:**
+- Create: `exact/path/to/file.ts`
+- Modify: `exact/path/to/existing.ts:123-145`
+- Test: `tests/exact/path/to/test.ts`
+**Blast radius:** [from Phase 1 data — callers affected, modules touched]
+**Wiring:** [where new exports get called from — declared caller for every export]
+- [ ] **Step 1: Write the failing test**
+```typescript
+test('specific behavior', () => {
+    const result = function(input);
+    expect(result).toBe(expected);
+});
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `pnpm test -- tests/path/test.ts`
+Expected: FAIL with "function not defined"
-### Phase 3 — Approach
+- [ ] **Step 3: Write minimal implementation**
-Propose 2-3 implementation approaches with trade-offs:
-- For each approach: what files change, estimated blast radius, pros, cons.
-- Recommend one approach with justification.
-- Flag any approach that would increase module coupling.
+```typescript
+export function myFunction(input: string): string {
+    return expected;
+}
+```
-### Phase 4 — Plan
+- [ ] **Step 4: Run test to verify it passes**
-Produce a step-by-step implementation plan:
-- Exact files and functions to create or modify.
-- Blast radius per change (from Phase 1 data).
-- Required tests for each step.
-- Wiring declarations: every new export must have a declared caller.
+Run: `pnpm test -- tests/path/test.ts`
+Expected: PASS
-Iron law: "Every new export in the plan must have a declared caller. If a function has no caller, it's not part of the plan — remove it."
+- [ ] **Step 5: Commit**
-### Phase 5 — Adversarial Review
+```bash
+git add tests/path/test.ts src/path/file.ts
+git commit -m "feat: add specific feature"
+```
+````
-Before presenting the plan, adversarially review it:
-- Are all new exports connected to declared callers? (If not, remove them.)
-- Is the blast radius acceptable, or does the approach touch too many callers?
-- Does the approach minimize coupling, or does it introduce new cross-module dependencies?
-- Are there simpler alternatives that achieve the same result with fewer file changes?
-- Would any step create unreachable code paths?
+### No Placeholders
-Only present the plan after it passes this review.
+Every step must contain the actual content an engineer needs. These are **plan failures** — never write them:
-## Output
+- "TBD", "TODO", "implement later", "fill in details"
+- "Add appropriate error handling" / "add validation" / "handle edge cases"
+- "Write tests for the above" (without actual test code)
+- "Similar to Task N" (repeat the code — the engineer may be reading tasks out of order)
+- Steps that describe what to do without showing how (code blocks required for code steps)
+- References to types, functions, or methods not defined in any task
+### Remember
+- Exact file paths always
+- Complete code in every step — if a step changes code, show the code
+- Exact commands with expected output
+- DRY, YAGNI, TDD, frequent commits
+- Every new export must have a declared caller — if a function has no caller, it's not part of the plan
+---
+## Phase 5 — Adversarial Review
+Review the plan before presenting it. Apply all relevant sections, adapting depth to change size. Skip sections that don't apply.
+### Section 1 — Architecture (skip for small/single-file changes)
+- Component boundaries and coupling concerns
+- Dependency graph: does this change shrink or expand surface area?
+- Data flow bottlenecks and single points of failure
+- Does this need new code at all, or can an existing pattern solve it?
+### Section 2 — Code Quality (always)
+- Organization, module structure, DRY violations (be aggressive)
+- Error handling gaps and missing edge cases (call out explicitly)
+- Technical debt: shortcuts, hardcoded values, magic strings
+- Over-engineered or under-engineered relative to engineering preferences
+- Reuse: does code for this already exist somewhere?
+### Section 3 — Wiring & Integration (always)
+- Are all new exports called from a production entry point?
+- Run `get_blast_radius` on any new/changed functions — zero callers = not done
+- `check_reachability` on new exports — verify reachable from API handlers, crons, or event handlers
+- Does every task declare WHERE new code gets called from? If not, flag it
+- Integration points: how does this connect to what already exists?
+### Section 4 — Tests (always)
+- Coverage gaps: unit, integration, e2e
+- Test quality: real assertions with hardcoded expected values, not `.toBeDefined()` or computed expectations
+- Missing edge cases and untested failure/error paths
+- One integration test proving wiring > ten isolated unit tests
+### Section 5 — Performance (only if relevant)
+- N+1 queries, unnecessary DB round-trips
+- Memory concerns, caching opportunities
+- Slow or high-complexity code paths
+### Section 6 — Security & Attack Surface (always for new endpoints/routes/APIs; skip for pure refactors)
+- **Authentication model** — what authenticates requests? Where validated? What happens on failure?
+- **Sensitive data in URLs** — tokens, session IDs, or tenant identifiers in URL paths/params leak via Referer, history, logs
+- **Authorization boundaries** — what prevents User A from accessing User B's data?
+- **Input trust boundary** — user input flowing into shell commands, queries, HTML rendering, or file paths
+- **Error and response surface** — do error responses expose internals to unauthenticated callers?
+- **New attack surface** — new public URLs, webhooks, API routes each need rate limiting, auth, and input validation
+### Self-Review Checklist (run after all sections)
+1. **Spec coverage:** Skim each section/requirement in the spec. Can you point to a task that implements it? List any gaps.
+2. **Placeholder scan:** Search the plan for red flags from the "No Placeholders" section. Fix them.
+3. **Type consistency:** Do types, method signatures, and property names used in later tasks match earlier tasks? A function called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug.
+4. **Wiring sweep:** `get_blast_radius` on ALL new exports — zero callers on non-entry-points = plan is incomplete.
+### For Each Issue Found
+For every specific issue (bug, smell, design concern, risk, missing wiring):
+1. **Describe concretely** — file, line/function reference, what's wrong
+2. **Present 2-3 options** including "do nothing" where reasonable
+3. **For each option** — implementation effort, risk, blast radius, maintenance burden
+4. **Recommend one** mapped to engineering preferences above, and say why
+5. **Ask** whether the user agrees or wants a different direction
+Number each issue (1, 2, 3...) and letter each option (A, B, C...). Recommended option is always listed first.
+---
+## Phase 6 — Output & Handoff
+### Present the Plan
 A complete implementation plan containing:
 - Risk rating (LOW / MEDIUM / HIGH) with data backing
 - Recommended approach with trade-off rationale
-- Numbered steps with exact files and functions
-- Blast radius per change
-- Required tests per step
+- File structure map
+- Numbered tasks with bite-sized steps, exact files, and complete code
+- Blast radius per task
 - Wiring declarations for every new export
+- Required tests per step
 - Adversarial review findings (issues caught and resolved)
+Save plans to: `docs/sessions/YYYY-MM-DD-<feature-name>.md`
+(User preferences for plan location override this default)
+### Execution Handoff
+After saving the plan, offer execution choice:
+**"Plan complete and saved. Two execution options:**
+**1. Orchestrated (recommended)** — I dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Use `pharaoh:orchestrate`.
+**2. Inline Execution** — Execute tasks in this session with checkpoints. Use `pharaoh:execute`.
+**Which approach?"**
+---
+## Pharaoh Checkpoints (use throughout, not just at the end)
+- **Before planning**: recon (Phase 1)
+- **During plan writing**: `get_blast_radius` when evaluating impact; `search_functions` before proposing new code
+- **During review**: `get_blast_radius` on all new/changed functions; `check_reachability` on new exports
+- **After decisions**: `get_unused_code` to catch disconnections
+- **Final sweep**: `get_blast_radius` on ALL new exports — zero callers on non-entry-points = plan is incomplete
+## Workflow Rules
+- After each review section, pause and ask for feedback before moving on (BIG CHANGE mode)
+- Do not assume priorities on timeline or scale
+- If you see a better approach to the entire plan, say so BEFORE section-by-section review
+- Challenge the approach if you see a better one — your job is to find problems the user will regret later