npm - @athenaflow/plugin-matt-pocock-skills - Versions diffs - 0.1.1 - Mend

@athenaflow/plugin-matt-pocock-skills 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (172) hide show

package/dist/0.1.1/codex/plugin/skills/to-issues/agents/claude.yaml ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ frontmatter:
2	+ user-invocable: true

package/dist/0.1.1/codex/plugin/skills/to-issues/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "To Issues"
+  short_description: "Break a plan, spec, or PRD into independently-grabbable issues using vertical slices"
+  default_prompt: "Break this PRD into vertical-slice issues with clear acceptance criteria."

package/dist/0.1.1/codex/plugin/skills/to-prd/SKILL.md ADDED Viewed

@@ -0,0 +1,76 @@
+---
+name: to-prd
+description: Turn the current conversation context into a PRD and publish it to the project issue tracker. Use when user wants to create a PRD from the current context.
+---
+This skill takes the current conversation context and codebase understanding and produces a PRD. Do NOT interview the user — just synthesize what you already know.
+The issue tracker and triage label vocabulary should have been provided to you — run `/setup-matt-pocock-skills` if not.
+## Process
+1. Explore the repo to understand the current state of the codebase, if you haven't already. Use the project's domain glossary vocabulary throughout the PRD, and respect any ADRs in the area you're touching.
+2. Sketch out the major modules you will need to build or modify to complete the implementation. Actively look for opportunities to extract deep modules that can be tested in isolation.
+A deep module (as opposed to a shallow module) is one which encapsulates a lot of functionality in a simple, testable interface which rarely changes.
+Check with the user that these modules match their expectations. Check with the user which modules they want tests written for.
+3. Write the PRD using the template below, then publish it to the project issue tracker. Apply the `ready-for-agent` triage label - no need for additional triage.
+<prd-template>
+## Problem Statement
+The problem that the user is facing, from the user's perspective.
+## Solution
+The solution to the problem, from the user's perspective.
+## User Stories
+A LONG, numbered list of user stories. Each user story should be in the format of:
+1. As an <actor>, I want a <feature>, so that <benefit>
+<user-story-example>
+1. As a mobile bank customer, I want to see balance on my accounts, so that I can make better informed decisions about my spending
+</user-story-example>
+This list of user stories should be extremely extensive and cover all aspects of the feature.
+## Implementation Decisions
+A list of implementation decisions that were made. This can include:
+- The modules that will be built/modified
+- The interfaces of those modules that will be modified
+- Technical clarifications from the developer
+- Architectural decisions
+- Schema changes
+- API contracts
+- Specific interactions
+Do NOT include specific file paths or code snippets. They may end up being outdated very quickly.
+Exception: if a prototype produced a snippet that encodes a decision more precisely than prose can (state machine, reducer, schema, type shape), inline it within the relevant decision and note briefly that it came from a prototype. Trim to the decision-rich parts — not a working demo, just the important bits.
+## Testing Decisions
+A list of testing decisions that were made. Include:
+- A description of what makes a good test (only test external behavior, not implementation details)
+- Which modules will be tested
+- Prior art for the tests (i.e. similar types of tests in the codebase)
+## Out of Scope
+A description of the things that are out of scope for this PRD.
+## Further Notes
+Any further notes about the feature.
+</prd-template>

package/dist/0.1.1/codex/plugin/skills/to-prd/agents/claude.yaml ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ frontmatter:
2	+ user-invocable: true

package/dist/0.1.1/codex/plugin/skills/to-prd/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "To PRD"
+  short_description: "Synthesize the current conversation into a PRD on the issue tracker"
+  default_prompt: "Turn what we just discussed into a PRD and submit it as an issue."

package/dist/0.1.1/codex/plugin/skills/triage/AGENT-BRIEF.md ADDED Viewed

@@ -0,0 +1,168 @@
+# Writing Agent Briefs
+An agent brief is a structured comment posted on a GitHub issue when it moves to `ready-for-agent`. It is the authoritative specification that an AFK agent will work from. The original issue body and discussion are context — the agent brief is the contract.
+## Principles
+### Durability over precision
+The issue may sit in `ready-for-agent` for days or weeks. The codebase will change in the meantime. Write the brief so it stays useful even as files are renamed, moved, or refactored.
+- **Do** describe interfaces, types, and behavioral contracts
+- **Do** name specific types, function signatures, or config shapes that the agent should look for or modify
+- **Don't** reference file paths — they go stale
+- **Don't** reference line numbers
+- **Don't** assume the current implementation structure will remain the same
+### Behavioral, not procedural
+Describe **what** the system should do, not **how** to implement it. The agent will explore the codebase fresh and make its own implementation decisions.
+- **Good:** "The `SkillConfig` type should accept an optional `schedule` field of type `CronExpression`"
+- **Bad:** "Open src/types/skill.ts and add a schedule field on line 42"
+- **Good:** "When a user runs `/triage` with no arguments, they should see a summary of issues needing attention"
+- **Bad:** "Add a switch statement in the main handler function"
+### Complete acceptance criteria
+The agent needs to know when it's done. Every agent brief must have concrete, testable acceptance criteria. Each criterion should be independently verifiable.
+- **Good:** "Running `gh issue list --label needs-triage` returns issues that have been through initial classification"
+- **Bad:** "Triage should work correctly"
+### Explicit scope boundaries
+State what is out of scope. This prevents the agent from gold-plating or making assumptions about adjacent features.
+## Template
+```markdown
+## Agent Brief
+**Category:** bug / enhancement
+**Summary:** one-line description of what needs to happen
+**Current behavior:**
+Describe what happens now. For bugs, this is the broken behavior.
+For enhancements, this is the status quo the feature builds on.
+**Desired behavior:**
+Describe what should happen after the agent's work is complete.
+Be specific about edge cases and error conditions.
+**Key interfaces:**
+- `TypeName` — what needs to change and why
+- `functionName()` return type — what it currently returns vs what it should return
+- Config shape — any new configuration options needed
+**Acceptance criteria:**
+- [ ] Specific, testable criterion 1
+- [ ] Specific, testable criterion 2
+- [ ] Specific, testable criterion 3
+**Out of scope:**
+- Thing that should NOT be changed or addressed in this issue
+- Adjacent feature that might seem related but is separate
+```
+## Examples
+### Good agent brief (bug)
+```markdown
+## Agent Brief
+**Category:** bug
+**Summary:** Skill description truncation drops mid-word, producing broken output
+**Current behavior:**
+When a skill description exceeds 1024 characters, it is truncated at exactly
+1024 characters regardless of word boundaries. This produces descriptions
+that end mid-word (e.g. "Use when the user wants to confi").
+**Desired behavior:**
+Truncation should break at the last word boundary before 1024 characters
+and append "..." to indicate truncation.
+**Key interfaces:**
+- The `SkillMetadata` type's `description` field — no type change needed,
+  but the validation/processing logic that populates it needs to respect
+  word boundaries
+- Any function that reads SKILL.md frontmatter and extracts the description
+**Acceptance criteria:**
+- [ ] Descriptions under 1024 chars are unchanged
+- [ ] Descriptions over 1024 chars are truncated at the last word boundary
+      before 1024 chars
+- [ ] Truncated descriptions end with "..."
+- [ ] The total length including "..." does not exceed 1024 chars
+**Out of scope:**
+- Changing the 1024 char limit itself
+- Multi-line description support
+```
+### Good agent brief (enhancement)
+```markdown
+## Agent Brief
+**Category:** enhancement
+**Summary:** Add `.out-of-scope/` directory support for tracking rejected feature requests
+**Current behavior:**
+When a feature request is rejected, the issue is closed with a `wontfix` label
+and a comment. There is no persistent record of the decision or reasoning.
+Future similar requests require the maintainer to recall or search for the
+prior discussion.
+**Desired behavior:**
+Rejected feature requests should be documented in `.out-of-scope/<concept>.md`
+files that capture the decision, reasoning, and links to all issues that
+requested the feature. When triaging new issues, these files should be
+checked for matches.
+**Key interfaces:**
+- Markdown file format in `.out-of-scope/` — each file should have a
+  `# Concept Name` heading, a `**Decision:**` line, a `**Reason:**` line,
+  and a `**Prior requests:**` list with issue links
+- The triage workflow should read all `.out-of-scope/*.md` files early
+  and match incoming issues against them by concept similarity
+**Acceptance criteria:**
+- [ ] Closing a feature as wontfix creates/updates a file in `.out-of-scope/`
+- [ ] The file includes the decision, reasoning, and link to the closed issue
+- [ ] If a matching `.out-of-scope/` file already exists, the new issue is
+      appended to its "Prior requests" list rather than creating a duplicate
+- [ ] During triage, existing `.out-of-scope/` files are checked and surfaced
+      when a new issue matches a prior rejection
+**Out of scope:**
+- Automated matching (human confirms the match)
+- Reopening previously rejected features
+- Bug reports (only enhancement rejections go to `.out-of-scope/`)
+```
+### Bad agent brief
+```markdown
+## Agent Brief
+**Summary:** Fix the triage bug
+**What to do:**
+The triage thing is broken. Look at the main file and fix it.
+The function around line 150 has the issue.
+**Files to change:**
+- src/triage/handler.ts (line 150)
+- src/types.ts (line 42)
+```
+This is bad because:
+- No category
+- Vague description ("the triage thing is broken")
+- References file paths and line numbers that will go stale
+- No acceptance criteria
+- No scope boundaries
+- No description of current vs desired behavior

package/dist/0.1.1/codex/plugin/skills/triage/OUT-OF-SCOPE.md ADDED Viewed

@@ -0,0 +1,101 @@
+# Out-of-Scope Knowledge Base
+The `.out-of-scope/` directory in a repo stores persistent records of rejected feature requests. It serves two purposes:
+1. **Institutional memory** — why a feature was rejected, so the reasoning isn't lost when the issue is closed
+2. **Deduplication** — when a new issue comes in that matches a prior rejection, the skill can surface the previous decision instead of re-litigating it
+## Directory structure
+```
+.out-of-scope/
+├── dark-mode.md
+├── plugin-system.md
+└── graphql-api.md
+```
+One file per **concept**, not per issue. Multiple issues requesting the same thing are grouped under one file.
+## File format
+The file should be written in a relaxed, readable style — more like a short design document than a database entry. Use paragraphs, code samples, and examples to make the reasoning clear and useful to someone encountering it for the first time.
+```markdown
+# Dark Mode
+This project does not support dark mode or user-facing theming.
+## Why this is out of scope
+The rendering pipeline assumes a single color palette defined in
+`ThemeConfig`. Supporting multiple themes would require:
+- A theme context provider wrapping the entire component tree
+- Per-component theme-aware style resolution
+- A persistence layer for user theme preferences
+This is a significant architectural change that doesn't align with the
+project's focus on content authoring. Theming is a concern for downstream
+consumers who embed or redistribute the output.
+```ts
+// The current ThemeConfig interface is not designed for runtime switching:
+interface ThemeConfig {
+  colors: ColorPalette; // single palette, resolved at build time
+  fonts: FontStack;
+}
+```
+## Prior requests
+- #42 — "Add dark mode support"
+- #87 — "Night theme for accessibility"
+- #134 — "Dark theme option"
+```
+### Naming the file
+Use a short, descriptive kebab-case name for the concept: `dark-mode.md`, `plugin-system.md`, `graphql-api.md`. The name should be recognizable enough that someone browsing the directory understands what was rejected without opening the file.
+### Writing the reason
+The reason should be substantive — not "we don't want this" but why. Good reasons reference:
+- Project scope or philosophy ("This project focuses on X; theming is a downstream concern")
+- Technical constraints ("Supporting this would require Y, which conflicts with our Z architecture")
+- Strategic decisions ("We chose to use A instead of B because...")
+The reason should be durable. Avoid referencing temporary circumstances ("we're too busy right now") — those aren't real rejections, they're deferrals.
+## When to check `.out-of-scope/`
+During triage (Step 1: Gather context), read all files in `.out-of-scope/`. When evaluating a new issue:
+- Check if the request matches an existing out-of-scope concept
+- Matching is by concept similarity, not keyword — "night theme" matches `dark-mode.md`
+- If there's a match, surface it to the maintainer: "This is similar to `.out-of-scope/dark-mode.md` — we rejected this before because [reason]. Do you still feel the same way?"
+The maintainer may:
+- **Confirm** — the new issue gets added to the existing file's "Prior requests" list, then closed
+- **Reconsider** — the out-of-scope file gets deleted or updated, and the issue proceeds through normal triage
+- **Disagree** — the issues are related but distinct, proceed with normal triage
+## When to write to `.out-of-scope/`
+Only when an **enhancement** (not a bug) is rejected as `wontfix`. The flow:
+1. Maintainer decides a feature request is out of scope
+2. Check if a matching `.out-of-scope/` file already exists
+3. If yes: append the new issue to the "Prior requests" list
+4. If no: create a new file with the concept name, decision, reason, and first prior request
+5. Post a comment on the issue explaining the decision and mentioning the `.out-of-scope/` file
+6. Close the issue with the `wontfix` label
+## Updating or removing out-of-scope files
+If the maintainer changes their mind about a previously rejected concept:
+- Delete the `.out-of-scope/` file
+- The skill does not need to reopen old issues — they're historical records
+- The new issue that triggered the reconsideration proceeds through normal triage

package/dist/0.1.1/codex/plugin/skills/triage/SKILL.md ADDED Viewed

@@ -0,0 +1,103 @@
+---
+name: triage
+description: Triage issues through a state machine driven by triage roles. Use when user wants to create an issue, triage issues, review incoming bugs or feature requests, prepare issues for an AFK agent, or manage issue workflow.
+---
+# Triage
+Move issues on the project issue tracker through a small state machine of triage roles.
+Every comment or issue posted to the issue tracker during triage **must** start with this disclaimer:
+```
+> *This was generated by AI during triage.*
+```
+## Reference docs
+- [AGENT-BRIEF.md](AGENT-BRIEF.md) — how to write durable agent briefs
+- [OUT-OF-SCOPE.md](OUT-OF-SCOPE.md) — how the `.out-of-scope/` knowledge base works
+## Roles
+Two **category** roles:
+- `bug` — something is broken
+- `enhancement` — new feature or improvement
+Five **state** roles:
+- `needs-triage` — maintainer needs to evaluate
+- `needs-info` — waiting on reporter for more information
+- `ready-for-agent` — fully specified, ready for an AFK agent
+- `ready-for-human` — needs human implementation
+- `wontfix` — will not be actioned
+Every triaged issue should carry exactly one category role and one state role. If state roles conflict, flag it and ask the maintainer before doing anything else.
+These are canonical role names — the actual label strings used in the issue tracker may differ. The mapping should have been provided to you - run `/setup-matt-pocock-skills` if not.
+State transitions: an unlabeled issue normally goes to `needs-triage` first; from there it moves to `needs-info`, `ready-for-agent`, `ready-for-human`, or `wontfix`. `needs-info` returns to `needs-triage` once the reporter replies. The maintainer can override at any time — flag transitions that look unusual and ask before proceeding.
+## Invocation
+The maintainer invokes `/triage` and describes what they want in natural language. Interpret the request and act. Examples:
+- "Show me anything that needs my attention"
+- "Let's look at #42"
+- "Move #42 to ready-for-agent"
+- "What's ready for agents to pick up?"
+## Show what needs attention
+Query the issue tracker and present three buckets, oldest first:
+1. **Unlabeled** — never triaged.
+2. **`needs-triage`** — evaluation in progress.
+3. **`needs-info` with reporter activity since the last triage notes** — needs re-evaluation.
+Show counts and a one-line summary per issue. Let the maintainer pick.
+## Triage a specific issue
+1. **Gather context.** Read the full issue (body, comments, labels, reporter, dates). Parse any prior triage notes so you don't re-ask resolved questions. Explore the codebase using the project's domain glossary, respecting ADRs in the area. Read `.out-of-scope/*.md` and surface any prior rejection that resembles this issue.
+2. **Recommend.** Tell the maintainer your category and state recommendation with reasoning, plus a brief codebase summary relevant to the issue. Wait for direction.
+3. **Reproduce (bugs only).** Before any grilling, attempt reproduction: read the reporter's steps, trace the relevant code, run tests or commands. Report what happened — successful repro with code path, failed repro, or insufficient detail (a strong `needs-info` signal). A confirmed repro makes a much stronger agent brief.
+4. **Grill (if needed).** If the issue needs fleshing out, run a `/grill-with-docs` session.
+5. **Apply the outcome:**
+   - `ready-for-agent` — post an agent brief comment ([AGENT-BRIEF.md](AGENT-BRIEF.md)).
+   - `ready-for-human` — same structure as an agent brief, but note why it can't be delegated (judgment calls, external access, design decisions, manual testing).
+   - `needs-info` — post triage notes (template below).
+   - `wontfix` (bug) — polite explanation, then close.
+   - `wontfix` (enhancement) — write to `.out-of-scope/`, link to it from a comment, then close ([OUT-OF-SCOPE.md](OUT-OF-SCOPE.md)).
+   - `needs-triage` — apply the role. Optional comment if there's partial progress.
+## Quick state override
+If the maintainer says "move #42 to ready-for-agent", trust them and apply the role directly. Confirm what you're about to do (role changes, comment, close), then act. Skip grilling. If moving to `ready-for-agent` without a grilling session, ask whether they want to write an agent brief.
+## Needs-info template
+```markdown
+## Triage Notes
+**What we've established so far:**
+- point 1
+- point 2
+**What we still need from you (@reporter):**
+- question 1
+- question 2
+```
+Capture everything resolved during grilling under "established so far" so the work isn't lost. Questions must be specific and actionable, not "please provide more info".
+## Resuming a previous session
+If prior triage notes exist on the issue, read them, check whether the reporter has answered any outstanding questions, and present an updated picture before continuing. Don't re-ask resolved questions.

package/dist/0.1.1/codex/plugin/skills/triage/agents/claude.yaml ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ frontmatter:
2	+ user-invocable: true

package/dist/0.1.1/codex/plugin/skills/triage/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Triage"
+  short_description: "Triage incoming issues through a state machine of triage roles"
+  default_prompt: "Triage the next batch of incoming issues into needs-info, ready-for-agent, ready-for-human, or wontfix."

package/dist/0.1.1/codex/plugin/skills/zoom-out/SKILL.md ADDED Viewed

@@ -0,0 +1,6 @@
+---
+name: zoom-out
+description: Tell the agent to zoom out and give broader context or a higher-level perspective. Use when you're unfamiliar with a section of code or need to understand how it fits into the bigger picture.
+---
+I don't know this area of code well. Go up a layer of abstraction. Give me a map of all the relevant modules and callers, using the project's domain glossary vocabulary.

package/dist/0.1.1/codex/plugin/skills/zoom-out/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  user-invocable: true
+  disable-model-invocation: true

package/dist/0.1.1/codex/plugin/skills/zoom-out/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Zoom Out"
+  short_description: "Give broader context and a higher-level perspective on an unfamiliar section of code"
+  default_prompt: "Zoom out and give a system-level map of this code area using the project glossary."

package/dist/0.1.1/release.json ADDED Viewed

@@ -0,0 +1,18 @@
+{
+  "schemaVersion": 1,
+  "pluginRef": "matt-pocock-skills@athena-workflow-marketplace",
+  "pluginName": "matt-pocock-skills",
+  "marketplaceName": "athena-workflow-marketplace",
+  "version": "0.1.1",
+  "artifacts": {
+    "claude": {
+      "type": "directory",
+      "path": "./claude/plugin"
+    },
+    "codex": {
+      "type": "marketplace",
+      "marketplacePath": "./.agents/plugins/marketplace.json",
+      "pluginPath": "./codex/plugin"
+    }
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,25 @@
+{
+  "name": "@athenaflow/plugin-matt-pocock-skills",
+  "version": "0.1.1",
+  "description": "Bundled subset of Matt Pocock's Skills For Real Engineers (https://github.com/mattpocock/skills) used by the fullstack-engineering workflow.",
+  "license": "SEE LICENSE IN NOTICE.md",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/lespaceman/athena-workflow-marketplace.git",
+    "directory": "plugins/matt-pocock-skills"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "files": [
+    ".claude-plugin/",
+    ".codex-plugin/",
+    "skills/",
+    "dist/",
+    "NOTICE.md"
+  ],
+  "scripts": {
+    "build:artifacts": "node ../../scripts/build-plugin-artifacts.mjs .",
+    "prepack": "npm run build:artifacts"
+  }
+}

package/skills/diagnose/SKILL.md ADDED Viewed

@@ -0,0 +1,117 @@
+---
+name: diagnose
+description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
+---
+# Diagnose
+A discipline for hard bugs. Skip phases only when explicitly justified.
+When exploring the codebase, use the project's domain glossary to get a clear mental model of the relevant modules, and check ADRs in the area you're touching.
+## Phase 1 — Build a feedback loop
+**This is the skill.** Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you.
+Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
+### Ways to construct one — try them in roughly this order
+1. **Failing test** at whatever seam reaches the bug — unit, integration, e2e.
+2. **Curl / HTTP script** against a running dev server.
+3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
+4. **Headless browser script** (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network.
+5. **Replay a captured trace.** Save a real network request / payload / event log to disk; replay it through the code path in isolation.
+6. **Throwaway harness.** Spin up a minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call.
+7. **Property / fuzz loop.** If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode.
+8. **Bisection harness.** If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can `git bisect run` it.
+9. **Differential loop.** Run the same input through old-version vs new-version (or two configs) and diff outputs.
+10. **HITL bash script.** Last resort. If a human must click, drive _them_ with `scripts/hitl-loop.template.sh` so the loop is still structured. Captured output feeds back to you.
+Build the right feedback loop, and the bug is 90% fixed.
+### Iterate on the loop itself
+Treat the loop as a product. Once you have _a_ loop, ask:
+- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
+- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
+- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)
+A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
+### Non-deterministic bugs
+The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable.
+### When you genuinely cannot build a loop
+Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do **not** proceed to hypothesise without a loop.
+Do not proceed to Phase 2 until you have a loop you believe in.
+## Phase 2 — Reproduce
+Run the loop. Watch the bug appear.
+Confirm:
+- [ ] The loop produces the failure mode the **user** described — not a different failure that happens to be nearby. Wrong bug = wrong fix.
+- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
+- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.
+Do not proceed until you reproduce the bug.
+## Phase 3 — Hypothesise
+Generate **3–5 ranked hypotheses** before testing any of them. Single-hypothesis generation anchors on the first plausible idea.
+Each hypothesis must be **falsifiable**: state the prediction it makes.
+> Format: "If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."
+If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.
+**Show the ranked list to the user before testing.** They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK.
+## Phase 4 — Instrument
+Each probe must map to a specific prediction from Phase 3. **Change one variable at a time.**
+Tool preference:
+1. **Debugger / REPL inspection** if the env supports it. One breakpoint beats ten logs.
+2. **Targeted logs** at the boundaries that distinguish hypotheses.
+3. Never "log everything and grep".
+**Tag every debug log** with a unique prefix, e.g. `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.
+**Perf branch.** For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness, `performance.now()`, profiler, query plan), then bisect. Measure first, fix second.
+## Phase 5 — Fix + regression test
+Write the regression test **before the fix** — but only if there is a **correct seam** for it.
+A correct seam is one where the test exercises the **real bug pattern** as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence.
+**If no correct seam exists, that itself is the finding.** Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase.
+If a correct seam exists:
+1. Turn the minimised repro into a failing test at that seam.
+2. Watch it fail.
+3. Apply the fix.
+4. Watch it pass.
+5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario.
+## Phase 6 — Cleanup + post-mortem
+Required before declaring done:
+- [ ] Original repro no longer reproduces (re-run the Phase 1 loop)
+- [ ] Regression test passes (or absence of seam is documented)
+- [ ] All `[DEBUG-...]` instrumentation removed (`grep` the prefix)
+- [ ] Throwaway prototypes deleted (or moved to a clearly-marked debug location)
+- [ ] The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns
+**Then ask: what would have prevented this bug?** If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the `/improve-codebase-architecture` skill with the specifics. Make the recommendation **after** the fix is in, not before — you have more information now than when you started.

package/skills/diagnose/agents/claude.yaml ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ frontmatter:
2	+ user-invocable: true