@pharaoh-so/mcp 0.2.9 → 0.2.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -51,7 +51,7 @@ claude mcp add pharaoh -- npx @pharaoh-so/mcp
51
51
  If you have a browser available (desktop, laptop), you can connect directly via SSE instead:
52
52
 
53
53
  ```bash
54
- claude mcp add --transport sse pharaoh https://mcp.pharaoh.so/sse
54
+ claude mcp add --transport sse --scope user pharaoh https://mcp.pharaoh.so/sse
55
55
  ```
56
56
 
57
57
  This uses OAuth in the browser and doesn't require this package.
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@pharaoh-so/mcp",
3
3
  "mcpName": "so.pharaoh/pharaoh",
4
- "version": "0.2.9",
4
+ "version": "0.2.11",
5
5
  "description": "MCP proxy for Pharaoh — maps codebases into queryable knowledge graphs for AI agents. Enables Claude Code in headless environments (VPS, SSH, CI) via device flow auth.",
6
6
  "type": "module",
7
7
  "main": "dist/index.js",
@@ -72,7 +72,7 @@ Full docs at [pharaoh.so/docs](https://pharaoh.so/docs).
72
72
 
73
73
  **Direct SSE** (desktop with browser, uses OAuth):
74
74
  ```
75
- claude mcp add pharaoh https://mcp.pharaoh.so/sse
75
+ claude mcp add --transport sse --scope user pharaoh https://mcp.pharaoh.so/sse
76
76
  ```
77
77
 
78
78
  **Headless** (VPS/SSH/containers, uses device flow — authorize on any device):
@@ -1,74 +1,310 @@
1
1
  ---
2
2
  name: plan
3
- description: "Architecture-aware planning workflow using Pharaoh codebase knowledge graph. Four-phase process: reconnaissance with MCP tools, blast radius analysis, approach selection with trade-offs, and step-by-step implementation plan with wiring declarations. Prevents dead exports and overcoupled designs before a line of code is written."
4
- version: 0.2.0
3
+ description: "Full-cycle architecture-aware planning: Pharaoh reconnaissance, structured plan writing with bite-sized TDD steps and zero placeholders, then deep adversarial review with wiring verification and interactive issue resolution. Replaces both writing-plans and plan-review."
4
+ version: 0.3.0
5
5
  homepage: https://pharaoh.so
6
6
  user-invocable: true
7
- metadata: {"emoji": "☥", "tags": ["planning", "architecture", "blast-radius", "pharaoh", "implementation-plan", "wiring"]}
7
+ metadata: {"emoji": "☥", "tags": ["planning", "architecture", "blast-radius", "pharaoh", "implementation-plan", "wiring", "review", "tdd"]}
8
8
  ---
9
9
 
10
10
  # Plan with Pharaoh
11
11
 
12
- Architecture-aware planning before implementation. Uses `plan-with-pharaoh` a 4-phase workflow (+ adversarial review) that combines reconnaissance, blast radius analysis, approach trade-offs, and a wired step-by-step plan. Iron law: every new export must have a declared caller.
12
+ Full-cycle planning: reconnaissance plan writing adversarial review. Combines architecture-aware graph analysis with rigorous plan craft and interactive issue resolution.
13
+
14
+ **You are in plan mode. Do NOT make any code changes. Think, evaluate, plan, review.**
13
15
 
14
16
  ## When to Use
15
17
 
16
- Invoke before implementing any non-trivial change: new features, refactors, adding modules, or anything that touches shared code. Use it whenever you need to answer "what's the right way to build this?" before writing code.
18
+ Before implementing any non-trivial change: new features, refactors, adding modules, or anything that touches shared code. Use it whenever you need to answer "what's the right way to build this?" before writing code.
19
+
20
+ ## Project Overrides
21
+
22
+ If a `.claude/plan-review.md` file exists in this project, read it now and apply those rules on top of this baseline. Project rules take precedence where they conflict.
23
+
24
+ ## Engineering Preferences (guide all recommendations)
25
+
26
+ - DRY: flag repetition aggressively
27
+ - Well-tested: too many tests > too few; mutation score > line coverage
28
+ - "Engineered enough" — not fragile/hacky, not over-abstracted
29
+ - Handle more edge cases, not fewer; thoughtfulness > speed
30
+ - Explicit > clever; simple > complex
31
+ - Subtraction > addition; target zero or negative net LOC
32
+ - Every export must have a caller; unwired code doesn't exist
33
+
34
+ ## Document Review Mode
17
35
 
18
- ## Workflow
36
+ If the user provides a document, PRD, prompt, or artifact alongside this command, that IS the plan to review. Still run Phase 1 (Reconnaissance) — always verify against the actual codebase. Then proceed to Phase 3 (Approach) and Phase 5 (Review), applying all review sections to that document. Do not treat it as background context — it is the subject of evaluation.
19
37
 
20
- ### Phase 1 — Reconnaissance (do NOT skip)
38
+ ---
39
+
40
+ ## Phase 1 — Reconnaissance (required — do this BEFORE anything else)
41
+
42
+ Do NOT plan from memory or assumptions. Query the actual codebase first:
21
43
 
22
- 1. Call `get_codebase_map` for the target repository to see the full module landscape.
23
- 2. Call `get_module_context` on each module likely affected by the change.
24
- 3. Call `search_functions` for terms related to the feature or change description.
25
- 4. Call `query_dependencies` between the affected modules to map coupling.
26
- 5. Call `get_blast_radius` on the primary target of the change.
27
- 6. Call `check_reachability` on the primary target to verify it is reachable from entry points.
44
+ 1. `get_codebase_map` current modules, hot files, dependency graph
45
+ 2. `search_functions` for keywords related to the plan find existing code to reuse/extend
46
+ 3. `get_module_context` on each module likely affected by the change
47
+ 4. `query_dependencies` between affected modules coupling, circular deps
48
+ 5. `get_blast_radius` on the primary target of the change
49
+ 6. `check_reachability` on the primary target to verify it's reachable from entry points
28
50
 
29
- ### Phase 2 Analysis
51
+ Ground every recommendation in what actually exists. If you propose adding something, confirm it doesn't already exist. If you propose changing something, know its blast radius.
52
+
53
+ ## Phase 2 — Analysis
30
54
 
31
55
  Using the reconnaissance data:
56
+
32
57
  - Evaluate the blast radius — how many callers and modules are affected?
33
- - Check `search_functions` results — does related code already exist?
58
+ - Check `search_functions` results — does related code already exist? Can you reuse/extend?
34
59
  - Assess module coupling — are the affected modules tightly or loosely coupled?
35
- - Rate the risk level (LOW / MEDIUM / HIGH) based on blast radius and coupling.
60
+ - Rate the risk level (LOW / MEDIUM / HIGH) based on blast radius and coupling
61
+ - Does this need new code at all, or can an existing pattern solve it?
62
+
63
+ ## Phase 3 — Approach
64
+
65
+ ### Scope Check
66
+
67
+ If the spec covers multiple independent subsystems, it should be broken into separate plans — one per subsystem. Each plan should produce working, testable software on its own. Suggest splitting if needed.
68
+
69
+ ### Mode Selection
70
+
71
+ Ask the user which mode before proceeding:
72
+
73
+ - **BIG CHANGE**: Full plan with all sections, approach trade-offs, interactive review
74
+ - **SMALL CHANGE**: Abbreviated plan, sections 2-4 of review only
75
+
76
+ ### Approach Trade-offs
77
+
78
+ Propose 2-3 implementation approaches:
79
+ - For each: what files change, estimated blast radius, pros, cons
80
+ - Recommend one with justification
81
+ - Flag any approach that would increase module coupling
82
+ - Flag any approach that requires new code where existing code could be extended
83
+
84
+ ## Phase 4 — Plan Writing
85
+
86
+ ### File Structure
87
+
88
+ Before defining tasks, map out which files will be created or modified and what each one is responsible for. This is where decomposition decisions get locked in.
89
+
90
+ - Design units with clear boundaries and well-defined interfaces. Each file should have one clear responsibility.
91
+ - Files that change together should live together. Split by responsibility, not by technical layer.
92
+ - In existing codebases, follow established patterns. If the codebase uses large files, don't unilaterally restructure — but if a file you're modifying has grown unwieldy, including a split is reasonable.
93
+
94
+ This structure informs the task decomposition. Each task should produce self-contained changes that make sense independently.
95
+
96
+ ### Plan Document Header
97
+
98
+ Every plan MUST start with:
99
+
100
+ ```markdown
101
+ # [Feature Name] Implementation Plan
102
+
103
+ > **For agentic workers:** Use `pharaoh:orchestrate` (recommended) or `pharaoh:execute` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
104
+
105
+ **Goal:** [One sentence describing what this builds]
106
+
107
+ **Architecture:** [2-3 sentences about approach]
108
+
109
+ **Tech Stack:** [Key technologies/libraries]
110
+
111
+ **Risk:** [LOW / MEDIUM / HIGH] — [one line justification from Phase 2 data]
112
+
113
+ ---
114
+ ```
115
+
116
+ ### Bite-Sized Task Granularity
117
+
118
+ Each step is one action (2-5 minutes):
119
+ - "Write the failing test" — step
120
+ - "Run it to make sure it fails" — step
121
+ - "Implement the minimal code to make the test pass" — step
122
+ - "Run the tests and make sure they pass" — step
123
+ - "Commit" — step
124
+
125
+ ### Task Structure
126
+
127
+ ````markdown
128
+ ### Task N: [Component Name]
129
+
130
+ **Files:**
131
+ - Create: `exact/path/to/file.ts`
132
+ - Modify: `exact/path/to/existing.ts:123-145`
133
+ - Test: `tests/exact/path/to/test.ts`
134
+
135
+ **Blast radius:** [from Phase 1 data — callers affected, modules touched]
136
+
137
+ **Wiring:** [where new exports get called from — declared caller for every export]
138
+
139
+ - [ ] **Step 1: Write the failing test**
140
+
141
+ ```typescript
142
+ test('specific behavior', () => {
143
+ const result = function(input);
144
+ expect(result).toBe(expected);
145
+ });
146
+ ```
147
+
148
+ - [ ] **Step 2: Run test to verify it fails**
149
+
150
+ Run: `pnpm test -- tests/path/test.ts`
151
+ Expected: FAIL with "function not defined"
36
152
 
37
- ### Phase 3 Approach
153
+ - [ ] **Step 3: Write minimal implementation**
38
154
 
39
- Propose 2-3 implementation approaches with trade-offs:
40
- - For each approach: what files change, estimated blast radius, pros, cons.
41
- - Recommend one approach with justification.
42
- - Flag any approach that would increase module coupling.
155
+ ```typescript
156
+ export function myFunction(input: string): string {
157
+ return expected;
158
+ }
159
+ ```
43
160
 
44
- ### Phase 4 Plan
161
+ - [ ] **Step 4: Run test to verify it passes**
45
162
 
46
- Produce a step-by-step implementation plan:
47
- - Exact files and functions to create or modify.
48
- - Blast radius per change (from Phase 1 data).
49
- - Required tests for each step.
50
- - Wiring declarations: every new export must have a declared caller.
163
+ Run: `pnpm test -- tests/path/test.ts`
164
+ Expected: PASS
51
165
 
52
- Iron law: "Every new export in the plan must have a declared caller. If a function has no caller, it's not part of the plan — remove it."
166
+ - [ ] **Step 5: Commit**
53
167
 
54
- ### Phase 5 — Adversarial Review
168
+ ```bash
169
+ git add tests/path/test.ts src/path/file.ts
170
+ git commit -m "feat: add specific feature"
171
+ ```
172
+ ````
55
173
 
56
- Before presenting the plan, adversarially review it:
57
- - Are all new exports connected to declared callers? (If not, remove them.)
58
- - Is the blast radius acceptable, or does the approach touch too many callers?
59
- - Does the approach minimize coupling, or does it introduce new cross-module dependencies?
60
- - Are there simpler alternatives that achieve the same result with fewer file changes?
61
- - Would any step create unreachable code paths?
174
+ ### No Placeholders
62
175
 
63
- Only present the plan after it passes this review.
176
+ Every step must contain the actual content an engineer needs. These are **plan failures** never write them:
64
177
 
65
- ## Output
178
+ - "TBD", "TODO", "implement later", "fill in details"
179
+ - "Add appropriate error handling" / "add validation" / "handle edge cases"
180
+ - "Write tests for the above" (without actual test code)
181
+ - "Similar to Task N" (repeat the code — the engineer may be reading tasks out of order)
182
+ - Steps that describe what to do without showing how (code blocks required for code steps)
183
+ - References to types, functions, or methods not defined in any task
184
+
185
+ ### Remember
186
+
187
+ - Exact file paths always
188
+ - Complete code in every step — if a step changes code, show the code
189
+ - Exact commands with expected output
190
+ - DRY, YAGNI, TDD, frequent commits
191
+ - Every new export must have a declared caller — if a function has no caller, it's not part of the plan
192
+
193
+ ---
194
+
195
+ ## Phase 5 — Adversarial Review
196
+
197
+ Review the plan before presenting it. Apply all relevant sections, adapting depth to change size. Skip sections that don't apply.
198
+
199
+ ### Section 1 — Architecture (skip for small/single-file changes)
200
+
201
+ - Component boundaries and coupling concerns
202
+ - Dependency graph: does this change shrink or expand surface area?
203
+ - Data flow bottlenecks and single points of failure
204
+ - Does this need new code at all, or can an existing pattern solve it?
205
+
206
+ ### Section 2 — Code Quality (always)
207
+
208
+ - Organization, module structure, DRY violations (be aggressive)
209
+ - Error handling gaps and missing edge cases (call out explicitly)
210
+ - Technical debt: shortcuts, hardcoded values, magic strings
211
+ - Over-engineered or under-engineered relative to engineering preferences
212
+ - Reuse: does code for this already exist somewhere?
213
+
214
+ ### Section 3 — Wiring & Integration (always)
215
+
216
+ - Are all new exports called from a production entry point?
217
+ - Run `get_blast_radius` on any new/changed functions — zero callers = not done
218
+ - `check_reachability` on new exports — verify reachable from API handlers, crons, or event handlers
219
+ - Does every task declare WHERE new code gets called from? If not, flag it
220
+ - Integration points: how does this connect to what already exists?
221
+
222
+ ### Section 4 — Tests (always)
223
+
224
+ - Coverage gaps: unit, integration, e2e
225
+ - Test quality: real assertions with hardcoded expected values, not `.toBeDefined()` or computed expectations
226
+ - Missing edge cases and untested failure/error paths
227
+ - One integration test proving wiring > ten isolated unit tests
228
+
229
+ ### Section 5 — Performance (only if relevant)
230
+
231
+ - N+1 queries, unnecessary DB round-trips
232
+ - Memory concerns, caching opportunities
233
+ - Slow or high-complexity code paths
234
+
235
+ ### Section 6 — Security & Attack Surface (always for new endpoints/routes/APIs; skip for pure refactors)
236
+
237
+ - **Authentication model** — what authenticates requests? Where validated? What happens on failure?
238
+ - **Sensitive data in URLs** — tokens, session IDs, or tenant identifiers in URL paths/params leak via Referer, history, logs
239
+ - **Authorization boundaries** — what prevents User A from accessing User B's data?
240
+ - **Input trust boundary** — user input flowing into shell commands, queries, HTML rendering, or file paths
241
+ - **Error and response surface** — do error responses expose internals to unauthenticated callers?
242
+ - **New attack surface** — new public URLs, webhooks, API routes each need rate limiting, auth, and input validation
243
+
244
+ ### Self-Review Checklist (run after all sections)
245
+
246
+ 1. **Spec coverage:** Skim each section/requirement in the spec. Can you point to a task that implements it? List any gaps.
247
+ 2. **Placeholder scan:** Search the plan for red flags from the "No Placeholders" section. Fix them.
248
+ 3. **Type consistency:** Do types, method signatures, and property names used in later tasks match earlier tasks? A function called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug.
249
+ 4. **Wiring sweep:** `get_blast_radius` on ALL new exports — zero callers on non-entry-points = plan is incomplete.
250
+
251
+ ### For Each Issue Found
252
+
253
+ For every specific issue (bug, smell, design concern, risk, missing wiring):
254
+
255
+ 1. **Describe concretely** — file, line/function reference, what's wrong
256
+ 2. **Present 2-3 options** including "do nothing" where reasonable
257
+ 3. **For each option** — implementation effort, risk, blast radius, maintenance burden
258
+ 4. **Recommend one** mapped to engineering preferences above, and say why
259
+ 5. **Ask** whether the user agrees or wants a different direction
260
+
261
+ Number each issue (1, 2, 3...) and letter each option (A, B, C...). Recommended option is always listed first.
262
+
263
+ ---
264
+
265
+ ## Phase 6 — Output & Handoff
266
+
267
+ ### Present the Plan
66
268
 
67
269
  A complete implementation plan containing:
270
+
68
271
  - Risk rating (LOW / MEDIUM / HIGH) with data backing
69
272
  - Recommended approach with trade-off rationale
70
- - Numbered steps with exact files and functions
71
- - Blast radius per change
72
- - Required tests per step
273
+ - File structure map
274
+ - Numbered tasks with bite-sized steps, exact files, and complete code
275
+ - Blast radius per task
73
276
  - Wiring declarations for every new export
277
+ - Required tests per step
74
278
  - Adversarial review findings (issues caught and resolved)
279
+
280
+ Save plans to: `docs/sessions/YYYY-MM-DD-<feature-name>.md`
281
+ (User preferences for plan location override this default)
282
+
283
+ ### Execution Handoff
284
+
285
+ After saving the plan, offer execution choice:
286
+
287
+ **"Plan complete and saved. Two execution options:**
288
+
289
+ **1. Orchestrated (recommended)** — I dispatch a fresh subagent per task with two-stage review (spec compliance then code quality). Use `pharaoh:orchestrate`.
290
+
291
+ **2. Inline Execution** — Execute tasks in this session with checkpoints. Use `pharaoh:execute`.
292
+
293
+ **Which approach?"**
294
+
295
+ ---
296
+
297
+ ## Pharaoh Checkpoints (use throughout, not just at the end)
298
+
299
+ - **Before planning**: recon (Phase 1)
300
+ - **During plan writing**: `get_blast_radius` when evaluating impact; `search_functions` before proposing new code
301
+ - **During review**: `get_blast_radius` on all new/changed functions; `check_reachability` on new exports
302
+ - **After decisions**: `get_unused_code` to catch disconnections
303
+ - **Final sweep**: `get_blast_radius` on ALL new exports — zero callers on non-entry-points = plan is incomplete
304
+
305
+ ## Workflow Rules
306
+
307
+ - After each review section, pause and ask for feedback before moving on (BIG CHANGE mode)
308
+ - Do not assume priorities on timeline or scale
309
+ - If you see a better approach to the entire plan, say so BEFORE section-by-section review
310
+ - Challenge the approach if you see a better one — your job is to find problems the user will regret later