npm - supipowers - Versions diffs - 1.3.0 → 1.5.0 - Mend

supipowers 1.3.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (132) hide show

package/README.md +118 -56
package/bin/install.ts +48 -128
package/package.json +11 -3
package/skills/code-review/SKILL.md +137 -40
package/skills/context-mode/SKILL.md +67 -52
package/skills/creating-supi-agents/SKILL.md +204 -0
package/skills/debugging/SKILL.md +86 -40
package/skills/fix-pr/SKILL.md +96 -65
package/skills/planning/SKILL.md +103 -46
package/skills/qa-strategy/SKILL.md +68 -46
package/skills/receiving-code-review/SKILL.md +60 -53
package/skills/release/SKILL.md +111 -39
package/skills/tdd/SKILL.md +118 -67
package/skills/verification/SKILL.md +71 -37
package/src/bootstrap.ts +24 -5
package/src/commands/agents.ts +249 -0
package/src/commands/ai-review.ts +1113 -0
package/src/commands/config.ts +224 -95
package/src/commands/doctor.ts +19 -13
package/src/commands/fix-pr.ts +8 -11
package/src/commands/generate.ts +200 -0
package/src/commands/model-picker.ts +5 -15
package/src/commands/model.ts +4 -5
package/src/commands/plan.ts +148 -92
package/src/commands/qa.ts +14 -23
package/src/commands/release.ts +504 -275
package/src/commands/review.ts +643 -86
package/src/commands/status.ts +44 -17
package/src/commands/supi.ts +69 -42
package/src/commands/update.ts +57 -2
package/src/config/defaults.ts +6 -39
package/src/config/loader.ts +388 -40
package/src/config/model-resolver.ts +26 -22
package/src/config/schema.ts +113 -48
package/src/context/analyzer.ts +4 -2
package/src/context-mode/detector.ts +16 -54
package/src/context-mode/hooks.ts +135 -17
package/src/context-mode/knowledge/chunker.ts +235 -0
package/src/context-mode/knowledge/store.ts +187 -0
package/src/context-mode/routing.ts +3 -9
package/src/context-mode/sandbox/executor.ts +183 -0
package/src/context-mode/sandbox/runners.ts +40 -0
package/src/context-mode/snapshot-builder.ts +2 -2
package/src/context-mode/tools.ts +440 -0
package/src/context-mode/web/fetcher.ts +117 -0
package/src/context-mode/web/html-to-md.ts +293 -0
package/src/debug/logger.ts +107 -0
package/src/deps/registry.ts +0 -20
package/src/docs/drift.ts +454 -0
package/src/fix-pr/fetch-comments.ts +66 -0
package/src/git/commit-msg.ts +2 -1
package/src/git/commit.ts +123 -141
package/src/git/conventions.ts +2 -2
package/src/git/status.ts +4 -1
package/src/lsp/bridge.ts +138 -12
package/src/planning/approval-flow.ts +125 -19
package/src/planning/plan-writer-prompt.ts +4 -11
package/src/planning/planning-ask-tool.ts +81 -0
package/src/planning/prompt-builder.ts +9 -169
package/src/planning/system-prompt.ts +290 -0
package/src/platform/omp.ts +50 -4
package/src/platform/progress.ts +182 -0
package/src/platform/test-utils.ts +4 -1
package/src/platform/tui-colors.ts +30 -0
package/src/platform/types.ts +1 -0
package/src/qa/detect-app-type.ts +102 -0
package/src/qa/discover-routes.ts +353 -0
package/src/quality/ai-session.ts +96 -0
package/src/quality/ai-setup.ts +86 -0
package/src/quality/gates/ai-review.ts +129 -0
package/src/quality/gates/build.ts +8 -0
package/src/quality/gates/command.ts +150 -0
package/src/quality/gates/format.ts +28 -0
package/src/quality/gates/lint.ts +22 -0
package/src/quality/gates/lsp-diagnostics.ts +84 -0
package/src/quality/gates/test-suite.ts +8 -0
package/src/quality/gates/typecheck.ts +22 -0
package/src/quality/registry.ts +25 -0
package/src/quality/review-gates.ts +33 -0
package/src/quality/runner.ts +268 -0
package/src/quality/schemas.ts +48 -0
package/src/quality/setup.ts +227 -0
package/src/release/changelog.ts +7 -3
package/src/release/channels/custom.ts +43 -0
package/src/release/channels/gitea.ts +35 -0
package/src/release/channels/github.ts +35 -0
package/src/release/channels/gitlab.ts +35 -0
package/src/release/channels/registry.ts +52 -0
package/src/release/channels/types.ts +27 -0
package/src/release/detector.ts +10 -63
package/src/release/executor.ts +61 -51
package/src/release/prompt.ts +38 -38
package/src/release/version.ts +129 -10
package/src/review/agent-loader.ts +331 -0
package/src/review/consolidator.ts +180 -0
package/src/review/default-agents/correctness.md +72 -0
package/src/review/default-agents/maintainability.md +64 -0
package/src/review/default-agents/security.md +67 -0
package/src/review/fixer.ts +219 -0
package/src/review/multi-agent-runner.ts +135 -0
package/src/review/output.ts +147 -0
package/src/review/prompts/agent-review-wrapper.md +36 -0
package/src/review/prompts/fix-findings.md +32 -0
package/src/review/prompts/fix-output-schema.md +18 -0
package/src/review/prompts/invalid-output-retry.md +22 -0
package/src/review/prompts/output-instructions.md +14 -0
package/src/review/prompts/review-output-schema.md +38 -0
package/src/review/prompts/single-review.md +53 -0
package/src/review/prompts/validation-review.md +30 -0
package/src/review/runner.ts +128 -0
package/src/review/scope.ts +353 -0
package/src/review/template.ts +15 -0
package/src/review/types.ts +296 -0
package/src/review/validator.ts +160 -0
package/src/storage/plans.ts +5 -3
package/src/storage/reports.ts +50 -7
package/src/storage/review-sessions.ts +117 -0
package/src/text.ts +19 -0
package/src/types.ts +336 -26
package/src/utils/paths.ts +39 -0
package/src/visual/companion.ts +5 -3
package/src/visual/start-server.ts +101 -0
package/src/visual/stop-server.ts +39 -0
package/bin/ctx-mode-wrapper.mjs +0 -66
package/src/config/profiles.ts +0 -64
package/src/context-mode/installer.ts +0 -38
package/src/quality/ai-review-gate.ts +0 -43
package/src/quality/gate-runner.ts +0 -67
package/src/quality/lsp-gate.ts +0 -24
package/src/quality/test-gate.ts +0 -39
package/src/visual/scripts/start-server.sh +0 -98
package/src/visual/scripts/stop-server.sh +0 -21

package/skills/context-mode/SKILL.md CHANGED Viewed

@@ -1,73 +1,88 @@
-# context-mode — MANDATORY routing rules
+# supi-context-mode
-You have context-mode MCP tools available. These rules are NOT optional — they protect your context window from flooding. A single unrouted command can dump 56 KB into context and waste the entire session.
+Route high-output tool calls through sandboxed execution to protect the context window.
-## BLOCKED commands — do NOT attempt these
+| Scope | Tool routing rules for supi-context-mode |
+|-------|-----------------------------------------------------|
+| Trigger | Always active when supi-context-mode tools are available |
+| Goal | Prevent context flooding — a single unrouted command can dump 56 KB into context |
+| Key rule | Blocked tools return errors; use sandbox equivalents instead |
-### curl / wget — BLOCKED
-Any Bash command containing `curl` or `wget` is intercepted and replaced with an error message. Do NOT retry.
-Instead use:
-- `ctx_fetch_and_index(url, source)` to fetch and index web pages
-- `ctx_execute(language: "javascript", code: "const r = await fetch(...)")` to run HTTP calls in sandbox
+## Tool Selection Hierarchy
-### Inline HTTP — BLOCKED
-Any Bash command containing `fetch('http`, `requests.get(`, `requests.post(`, `http.get(`, or `http.request(` is intercepted and replaced with an error message. Do NOT retry with Bash.
-Instead use:
-- `ctx_execute(language, code)` to run HTTP calls in sandbox — only stdout enters context
+Pick the highest-priority tool that fits the task:
-### WebFetch / Fetch — BLOCKED
-WebFetch and Fetch calls are denied entirely.
-Instead use:
-- `ctx_fetch_and_index(url, source)` then `ctx_search(queries)` to query the indexed content
+| Priority | Tool | Use for |
+|----------|------|---------|
+| 1 — GATHER | `ctx_batch_execute(commands, queries)` | Primary tool. Runs all commands, auto-indexes, returns search results. ONE call replaces 30+ individual calls. |
+| 2 — FOLLOW-UP | `ctx_search(queries: ["q1", "q2", ...])` | Query already-indexed content. Pass ALL questions as array in ONE call. |
+| 3 — PROCESSING | `ctx_execute(language, code)` / `ctx_execute_file(path, language, code)` | Sandbox execution. Only stdout enters context. |
+| 4 — WEB | `ctx_fetch_and_index(url, source)` then `ctx_search(queries)` | Fetch, chunk, index, query. Raw HTML never enters context. |
+| 5 — INDEX | `ctx_index(content, source)` | Store content in FTS5 knowledge base for later search. |
-### Grep — BLOCKED
-Grep calls are intercepted and blocked. Do NOT retry with Grep.
-Instead use:
-- `ctx_search(queries: ["<pattern>"])` to search indexed content
-- `ctx_batch_execute(commands, queries)` to run searches and return compressed results
-- `ctx_execute(language: "shell", code: "grep ...")` to run searches in sandbox
+## Blocked Commands
-### Find / Glob — BLOCKED
-Find/Glob calls are intercepted and blocked. Do NOT retry with Find/Glob.
-Instead use:
-- `ctx_execute(language: "shell", code: "find ...")` to run in sandbox
-- `ctx_batch_execute(commands, queries)` for multiple searches
+Blocked commands are intercepted and replaced with an error. Do NOT retry via Bash.
+| Blocked tool | Replacement |
+|---|---|
+| `curl` / `wget` in Bash | `ctx_fetch_and_index(url, source)` or `ctx_execute` with `fetch()` |
+| Inline HTTP (`fetch('http`, `requests.get(`, etc.) in Bash | `ctx_execute(language, code)` — only stdout enters context |
+| WebFetch / Fetch tool | `ctx_fetch_and_index(url, source)` then `ctx_search(queries)` |
+| Grep tool | `ctx_search(queries)`, `ctx_batch_execute(commands, queries)`, or `ctx_execute(language: "shell", code: "grep ...")` |
+| Find / Glob tool | `ctx_execute(language: "shell", code: "find ...")` or `ctx_batch_execute(commands, queries)` |
-## REDIRECTED tools — use sandbox equivalents
+### Example: routing a grep call
-### Bash (>20 lines output)
-Bash is ONLY for: `git`, `mkdir`, `rm`, `mv`, `cd`, `ls`, `npm install`, `pip install`, and other short-output commands.
-For everything else, use:
-- `ctx_batch_execute(commands, queries)` — run multiple commands + search in ONE call
-- `ctx_execute(language: "shell", code: "...")` — run in sandbox, only stdout enters context
+```
+// WRONG — blocked, returns error
+grep(pattern: "TODO", path: "src/")
-### Read (large files)
-Reads are never blocked — they always go through OMP's native read tool so hashline anchors (`N#XX`) are preserved for the edit contract. Large file reads (>110 lines) are automatically compressed to head (80 lines) + tail (30 lines) with a `sel` hint for the omitted section.
-For analysis-only reads where hashlines aren't needed, `ctx_execute_file(path, language, code)` remains more efficient — only your printed summary enters context.
+// CORRECT — runs in sandbox, only printed summary enters context
+ctx_execute(language: "shell", code: "grep -rn TODO src/")
-## Tool selection hierarchy
+// BEST — indexes output and returns search results in one call
+ctx_batch_execute(
+  commands: [{ label: "TODOs", command: "grep -rn TODO src/" }],
+  queries: ["TODO fixme priority"]
+)
+```
-1. **GATHER**: `ctx_batch_execute(commands, queries)` — Primary tool. Runs all commands, auto-indexes output, returns search results. ONE call replaces 30+ individual calls.
-2. **FOLLOW-UP**: `ctx_search(queries: ["q1", "q2", ...])` — Query indexed content. Pass ALL questions as array in ONE call.
-3. **PROCESSING**: `ctx_execute(language, code)` | `ctx_execute_file(path, language, code)` — Sandbox execution. Only stdout enters context.
-4. **WEB**: `ctx_fetch_and_index(url, source)` then `ctx_search(queries)` — Fetch, chunk, index, query. Raw HTML never enters context.
-5. **INDEX**: `ctx_index(content, source)` — Store content in FTS5 knowledge base for later search.
+## Redirected Tools
-## Subagent routing
+### Bash
-When spawning subagents (Agent/Task tool), the routing block is automatically injected into their prompt. Bash-type subagents are upgraded to general-purpose so they have access to MCP tools. You do NOT need to manually instruct subagents about context-mode.
+Bash is for commands producing <20 lines: `git`, `mkdir`, `rm`, `mv`, `ls`, `npm install`, `pip install`.
-## Output constraints
+For everything else:
+- `ctx_batch_execute(commands, queries)` — multiple commands + search in ONE call
+- `ctx_execute(language: "shell", code: "...")` — sandbox, only stdout enters context
-- Keep responses under 500 words.
-- Write artifacts (code, configs, PRDs) to FILES — never return them as inline text. Return only: file path + 1-line description.
-- When indexing content, use descriptive source labels so others can `ctx_search(source: "label")` later.
+### Read
-## ctx commands
+Reads are never blocked — OMP's native read tool preserves hashline anchors (`N#XX`) for the edit contract. Large reads (>110 lines) are auto-compressed to head (80) + tail (30) with a `sel` hint.
+For analysis-only reads where anchors are not needed, prefer `ctx_execute_file(path, language, code)` — only your printed summary enters context.
+## Subagent Routing
+The routing block is automatically injected into subagent prompts. Bash-type subagents are upgraded to general-purpose for tool access. You do NOT need to manually instruct subagents about context-mode.
+## Output Constraints
+- Write artifacts (code, configs, PRDs) to files — never inline. Return only: file path + 1-line description.
+- When indexing, use descriptive `source` labels so others can `ctx_search(source: "label")` later.
+## `ctx` Commands
 | Command | Action |
 |---------|--------|
-| `ctx stats` | Call the `ctx_stats` MCP tool and display the full output verbatim |
-| `ctx doctor` | Call the `ctx_doctor` MCP tool, run the returned shell command, display as checklist |
-| `ctx upgrade` | Call the `ctx_upgrade` MCP tool, run the returned shell command, display as checklist |
+| `ctx stats` | Call the `ctx_stats` tool, display full output verbatim |
+| `ctx purge` | Call the `ctx_purge` tool to clear all indexed content |
+## Checklist
+- [ ] Used tool hierarchy (batch_execute > search > execute > fetch) — not raw Bash/Grep/Find
+- [ ] No blocked tool calls attempted
+- [ ] Artifacts written to files, not returned inline
+- [ ] Source labels are descriptive for later search

package/skills/creating-supi-agents/SKILL.md ADDED Viewed

@@ -0,0 +1,204 @@
+---
+name: creating-supi-agents
+description: Interactive guide for creating a new supipowers review agent from scratch
+---
+# Creating a Review Agent
+Guide the user through creating a specialized code review agent for supipowers' multi-agent `/supi:review` pipeline.
+## Quick Reference
+| Aspect | Detail |
+|--------|--------|
+| **Input** | User's description of what the agent should review |
+| **Output** | Agent file saved to `.omp/agents/<agent-name>.md` |
+| **File format** | YAML frontmatter (`name`, `description`, `focus`) + prompt body + `{output_instructions}` |
+| **Hard constraint** | Prompt body **MUST** end with `{output_instructions}` on its own line — the pipeline replaces it with the output schema at review time |
+| **Process** | Goal → Research → Present → Refine → Save |
+## Agent File Format
+```markdown
+---
+name: <kebab-case-name>
+description: <one-line summary>
+focus: <comma-separated areas>
+---
+<prompt body>
+{output_instructions}
+```
+## Process
+### Step 1: Understand the Goal
+Ask what kind of reviewer the user wants. Common archetypes:
+| Archetype | Focus areas |
+|-----------|-------------|
+| Performance | algorithmic complexity, memory, caching, lazy loading |
+| Accessibility | ARIA, semantic HTML, screen reader support, WCAG |
+| API design | REST conventions, error contracts, versioning |
+| Test quality | coverage gaps, flaky patterns, missing edge cases |
+| Security | injection, auth, secrets, OWASP Top 10 |
+| Documentation | JSDoc, README accuracy, changelog updates |
+### Step 2: Research
+Research established checklists and best practices for the focus area (e.g., OWASP for security, WCAG for accessibility). Look for language/framework-specific patterns relevant to the user's stack.
+### Step 3: Present Overview
+Present a structured proposal:
+- **Name**: suggested kebab-case name
+- **Description**: one-line summary
+- **Focus areas**: comma-separated specializations
+- **Review criteria**: bulleted list of what the agent will check
+- **Example findings**: 2–3 examples of what this agent would flag
+### Step 4: Refine with User
+Ask if they want to adjust:
+- Focus areas or review criteria
+- Tone — **strict** (flags aggressively, treats ambiguity as an issue) vs. **advisory** (flags only clear problems, uses softer language)
+- Project-specific conventions to enforce
+Iterate until the user approves.
+### Step 5: Save the Agent
+Generate the final agent file and save to `.omp/agents/<agent-name>.md`.
+## Agent Prompt Guidelines
+### What makes a good agent prompt
+1. **State the role** clearly (e.g., "You are a performance-focused code reviewer")
+2. **List specific check items** as concrete, actionable criteria (not vague categories)
+3. **Provide severity guidance** — define what warrants `error` vs. `warning` vs. `info`
+4. **Define scope boundaries** — state what is NOT in scope to prevent overlap with other agents
+5. **End with `{output_instructions}`** — mandatory, on its own line
+### Before / After: Check Item Quality
+```markdown
+# BEFORE — vague
+## What to Check
+- Look for performance issues
+- Check if things could be faster
+- Make sure the code is efficient
+# AFTER — concrete and actionable
+## What to Check
+- **Algorithmic complexity**: O(n²) or worse loops, unnecessary nested iterations
+- **Memory allocation**: Large object creation in hot paths, missing cleanup
+- **Caching opportunities**: Repeated expensive computations that could be memoized
+```
+### Before / After: Severity Guidance
+```markdown
+# BEFORE — missing severity
+Flag any issues you find in the code.
+# AFTER — calibrated severity
+## Severity Guide
+- **error**: Will cause visible degradation in production (e.g., O(n²) on large datasets)
+- **warning**: Potential issue that depends on scale (e.g., missing memoization)
+- **info**: Optimization opportunity, not a current problem
+```
+## Example: Performance Agent
+```markdown
+---
+name: performance
+description: Reviews code for performance issues and optimization opportunities
+focus: algorithmic complexity, memory allocation, caching, lazy loading
+---
+You are a performance-focused code reviewer. Analyze the provided code diff for performance issues.
+## What to Check
+- **Algorithmic complexity**: O(n²) or worse loops, unnecessary nested iterations
+- **Memory allocation**: Large object creation in hot paths, missing cleanup
+- **Caching opportunities**: Repeated expensive computations that could be memoized
+- **Lazy loading**: Resources loaded eagerly that could be deferred
+- **Bundle size**: Unnecessary imports, tree-shaking blockers
+- **Database queries**: N+1 queries, missing indexes, unbounded result sets
+## Severity Guide
+- **error**: Will cause visible performance degradation in production (e.g., O(n²) on large datasets)
+- **warning**: Potential issue that depends on scale (e.g., missing memoization)
+- **info**: Optimization opportunity, not a current problem
+## Out of Scope
+- Correctness issues (handled by correctness agent)
+- Style/formatting (handled by linter)
+- Security concerns (handled by security agent)
+{output_instructions}
+```
+## Example: Accessibility Agent
+```markdown
+---
+name: accessibility
+description: Reviews UI code for accessibility violations and WCAG compliance
+focus: ARIA attributes, semantic HTML, keyboard navigation, color contrast
+---
+You are an accessibility-focused code reviewer. Analyze the provided code diff for accessibility issues using WCAG 2.1 AA as the baseline.
+## What to Check
+- **Semantic HTML**: `<div>` or `<span>` used where `<button>`, `<nav>`, `<main>`, `<section>` belongs
+- **ARIA attributes**: Missing `aria-label` on icon-only buttons, incorrect `role` values
+- **Keyboard navigation**: Interactive elements not reachable via Tab, missing focus indicators
+- **Color contrast**: Text/background combinations below 4.5:1 ratio (normal text) or 3:1 (large text)
+- **Form labels**: Inputs without associated `<label>` or `aria-labelledby`
+- **Image alt text**: Missing or non-descriptive `alt` attributes on `<img>` tags
+## Severity Guide
+- **error**: Blocks assistive technology users entirely (e.g., button with no accessible name)
+- **warning**: Degraded experience for assistive technology users (e.g., missing focus indicator)
+- **info**: Best-practice improvement (e.g., prefer `<nav>` over `<div role="navigation">`)
+## Out of Scope
+- Visual design preferences (handled by design review)
+- Performance (handled by performance agent)
+- Business logic correctness (handled by correctness agent)
+{output_instructions}
+```
+## MUST DO / MUST NOT DO
+| MUST DO | MUST NOT DO |
+|---------|-------------|
+| End every agent prompt with `{output_instructions}` on its own line | Omit `{output_instructions}` — the pipeline will fail |
+| Include a severity guide (`error` / `warning` / `info`) | Leave severity undefined — agents produce inconsistent ratings |
+| Define "Out of Scope" to prevent overlap with other agents | Let scope overlap — produces duplicate findings across agents |
+| Use concrete check items with specific patterns to look for | Use vague criteria like "check for issues" or "ensure quality" |
+| Save to `.omp/agents/<agent-name>.md` | Save anywhere else or leave unsaved |
+## Pre-Save Checklist
+Before saving the agent file, verify:
+- [ ] YAML frontmatter has `name`, `description`, and `focus`
+- [ ] Prompt body states the agent's role in the first sentence
+- [ ] At least 3 concrete, actionable check items
+- [ ] Severity guide defines `error`, `warning`, and `info` thresholds
+- [ ] "Out of Scope" section present
+- [ ] `{output_instructions}` is the last line of the prompt body
+- [ ] File saved to `.omp/agents/<agent-name>.md`

package/skills/debugging/SKILL.md CHANGED Viewed

@@ -5,58 +5,104 @@ description: Systematic debugging — find root cause before attempting fixes, 4
 # Systematic Debugging
-## Iron Law
+Find the root cause before touching the code. Every fix without a verified root cause is a coin flip.
-**NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
+## Quick Reference
-Symptom fixes are failure. If you haven't completed Phase 1, you cannot propose fixes.
+| Aspect | Detail |
+|--------|--------|
+| **Trigger** | Bug report, failing test, unexpected behavior, error message |
+| **Input** | Error output, stack trace, user report, failing test, or observed misbehavior |
+| **Output** | Root-cause statement, minimal fix, regression test, verification evidence |
+| **Gate rule** | You **MUST** complete Phase 1 before proposing any fix |
+| **Escalation** | After 3 failed fix attempts → stop, reassess architecture with human partner |
+## Phases
+| Phase | Goal | Gate (exit when true) |
+|-------|------|-----------------------|
+| 1. Investigate | Identify root cause | Root cause stated as a falsifiable claim |
+| 2. Analyze | Confirm via pattern comparison | Difference between working and broken code documented |
+| 3. Hypothesize | Single testable prediction | Hypothesis written as "Changing [X] produces [Y] because [Z]" |
+| 4. Fix | Minimal correct change | Failing test passes, no regressions |
+---
 ## Phase 1: Root Cause Investigation
-Complete this phase before proposing any fix.
+1. **Read the full error message and stack trace.** Extract: error type, file/line location, triggering input.
+2. **Reproduce consistently.** Write exact steps. If it doesn't reproduce, you don't understand it yet.
+3. **Check recent changes.** `git diff`, new dependencies, config changes — narrow the blast radius.
+4. **Log at each boundary** in multi-component systems. Capture: timestamps, payloads, status codes.
+5. **Trace data flow** backward through the call stack to the original trigger.
+**Gate:** State the root cause as a single sentence before moving on.
+### Example — Phase 1
+```
+BAD (skipping investigation):
+  "TypeError: Cannot read property 'id' of undefined"
+  → "I'll add a null check on line 42."
-1. **Read error messages carefully.** Don't skip; they often contain solutions.
-2. **Reproduce consistently.** Exact steps, every time.
-3. **Check recent changes.** `git diff`, new dependencies, config changes.
-4. **Gather evidence** in multi-component systems: diagnostic instrumentation at each boundary.
-5. **Trace data flow** backward through call stack to find original trigger.
+GOOD (investigating):
+  "TypeError: Cannot read property 'id' of undefined at UserService.getProfile:42"
+  → git diff shows fetchUser was changed yesterday to return { data: user } instead of user
+  → Line 42 reads `user.id` but now receives the wrapper object
+  → Root cause: fetchUser response shape changed; callers were not updated
+```
 ## Phase 2: Pattern Analysis
-1. Find working examples in codebase.
-2. Compare against references completely (not skimming).
-3. Identify differences between working and broken.
-4. Understand dependencies and assumptions.
+1. Find a **working example** of the same pattern in the codebase.
+2. **Diff working vs broken** line-by-line. Document each difference.
+3. **List the assumptions** the broken code makes about its inputs, environment, and call order.
 ## Phase 3: Hypothesis and Testing
-1. Form a single, specific hypothesis (not vague).
-2. Test minimally: smallest possible change, one variable at a time.
-3. Verify before continuing. If wrong → form NEW hypothesis, not more fixes.
-4. Admit uncertainty. Don't pretend to know.
+1. **Write the hypothesis** in this format: "Changing [X] will produce [Y] because [Z]."
+2. **Test one variable** at a time — smallest possible change.
+3. If the hypothesis is wrong, return to Phase 1 with the new evidence. Do not stack guesses.
+4. If confidence is not high, state: "I'm uncertain because [reason]" before proceeding.
+### Example — Hypothesis
+```
+BAD:
+  "Something is wrong with the config."
+GOOD:
+  "Changing `loadConfig` to parseInt(env.TIMEOUT) will fix the 'NaN' comparison
+   because env vars are strings and the timeout check uses numeric comparison."
+```
 ## Phase 4: Implementation
-1. Create failing test case first.
-2. Implement single fix addressing root cause only.
-3. Verify: test passes, no other tests broken.
-4. **If fix doesn't work:**
-   - < 3 attempts: Return to Phase 1 with new information
-   - ≥ 3 attempts: **STOP** and question the architecture. Discuss with human partner.
-## Red Flags — STOP and Follow the Process
-- "Quick fix for now, investigate later"
-- "Just try changing X and see if it works"
-- "Skip the test, I'll manually verify"
-- "It's probably X, let me fix that"
-- "I don't fully understand but this might work"
-- "One more fix attempt" (when already tried 2+)
-- Each fix reveals new problem in different place
-## When to Use (Especially)
-- Under time pressure (emergencies make guessing tempting)
-- "Just one quick fix" seems obvious
-- Already tried multiple fixes
-- Don't fully understand the issue
+1. **Write a failing test** that reproduces the bug.
+2. **Implement a single fix** addressing the root cause only.
+3. **Verify:** test passes, no other tests broken.
+4. If fix fails:
+   - < 3 attempts → return to Phase 1 with new information.
+   - >= 3 attempts → **STOP.** Reassess the architecture. Discuss with human partner.
+---
+## MUST DO / MUST NOT DO
+| MUST DO | MUST NOT DO |
+|---------|-------------|
+| Complete Phase 1 before proposing any fix | Skip to a fix from a stack trace alone |
+| State root cause as a falsifiable claim | Propose a vague cause ("something in config") |
+| Write a failing test before fixing | Skip the test and manually verify |
+| Change one variable at a time | Stack multiple speculative changes |
+| Escalate after 3 failed attempts | Say "one more fix attempt" after 2+ failures |
+## Final Checklist
+- [ ] Root cause identified and stated as a single sentence
+- [ ] Working vs broken difference documented
+- [ ] Hypothesis written as "Changing [X] produces [Y] because [Z]"
+- [ ] Failing test written before fix applied
+- [ ] Fix addresses root cause only — no speculative side-fixes
+- [ ] All existing tests still pass
+- [ ] After 3 failed attempts: stopped and escalated