npm - @apogeelabs/the-agency - Versions diffs - 0.7.0 → 0.9.0 - Mend

@apogeelabs/the-agency 0.7.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json +1 -1
package/src/templates/.ai/UnitTestGeneration.md +10 -10
package/src/templates/.claude/agents/test-hardener.md +32 -8
package/src/templates/.claude/commands/prep-pr.md +242 -47
package/src/templates/.claude/commands/review-pr.md +39 -6

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@apogeelabs/the-agency",
-  "version": "0.7.0",
+  "version": "0.9.0",
   "description": "Centralized Claude Code agents, commands, and workflows",
   "type": "module",
   "bin": {

package/src/templates/.ai/UnitTestGeneration.md CHANGED Viewed

@@ -10,9 +10,9 @@ Before generating any tests, complete these steps in order:
 1. **Branch analysis** (Coverage-Driven Test Planning): Read the source, enumerate every branch, map each to a test scenario.
 2. **Superfluous test check** (Superfluous Test Prevention): Verify each planned scenario covers a distinct branch, not the same branch with different values.
-3. **Execution location check** (CRITICAL RULE #1): Execute methods under test in `beforeEach()`, not in `it()` blocks.
-4. **Mock configuration check** (CRITICAL RULE #2): Configure mock behavior in `beforeEach`, not at module level.
-5. **Callback invocation check** (CRITICAL RULE #3): Use `mockImplementation` for callbacks, not `mock.calls[N][M]()`.
+3. **Execution location check** (CRITICAL RULE 1): Execute methods under test in `beforeEach()`, not in `it()` blocks.
+4. **Mock configuration check** (CRITICAL RULE 2): Configure mock behavior in `beforeEach`, not at module level.
+5. **Callback invocation check** (CRITICAL RULE 3): Use `mockImplementation` for callbacks, not `mock.calls[N][M]()`.
 Once tests are generated:
@@ -21,7 +21,7 @@ Once tests are generated:
 ---
-## CRITICAL RULE #1: Test Execution Location
+## CRITICAL RULE 1: Test Execution Location
 ⚠️ **NON-NEGOTIABLE RULE** ⚠️
@@ -97,7 +97,7 @@ Before writing any test suite, verify:
 ---
-## CRITICAL RULE #2: Mock Configuration Location
+## CRITICAL RULE 2: Mock Configuration Location
 ⚠️ **NON-NEGOTIABLE RULE** ⚠️
@@ -152,7 +152,7 @@ const mockLuxon = {
 ---
-## CRITICAL RULE #3: Callback Invocation via mockImplementation
+## CRITICAL RULE 3: Callback Invocation via mockImplementation
 ⚠️ **NON-NEGOTIABLE RULE** ⚠️
@@ -329,7 +329,7 @@ describe("when status is inactive", () => {
     - Inner `describe` blocks handle specific scenarios ("when X condition")
 - **Test descriptions**:
     - `describe` blocks handle the "when" conditions/scenarios
-    - `it` blocks focus purely on "should" assertions (no conditions). See **CRITICAL RULE #1** above.
+    - `it` blocks focus purely on "should" assertions (no conditions). See **CRITICAL RULE 1** above.
     - Avoid redundancy - don't repeat conditions in both `describe` and `it`
 - **Grouping logic**:
     - Group related test cases under shared setup scenarios
@@ -366,7 +366,7 @@ describe("when status is inactive", () => {
 - **State initialization**: Variables initialized in `beforeEach`
 - **Fresh state guarantee**:
     - Call `jest.clearAllMocks()` and `jest.resetModules()` in outer `beforeEach`
-    - **Do NOT use `jest.resetAllMocks()`** in the outer `beforeEach` — it wipes implementations, which breaks module-level `mockReturnThis()` chains (see CRITICAL RULE #2 exception)
+    - **Do NOT use `jest.resetAllMocks()`** in the outer `beforeEach` — it wipes implementations, which breaks module-level `mockReturnThis()` chains (see CRITICAL RULE 2 exception)
     - Use `mockReset()` on **individual mocks** when a nested `beforeEach` needs to replace behavior already configured by an outer `beforeEach`
     - **Never use `mockRestore()`** — it only applies to `jest.spyOn`, which this codebase does not use
@@ -385,14 +385,14 @@ Am I in the **outer** `beforeEach`?
 ## Mocking Strategy
-- **Module-level mock declarations**: Dependencies mocked above test suites with bare `jest.fn()` — see **CRITICAL RULE #2** for configuration rules
+- **Module-level mock declarations**: Dependencies mocked above test suites with bare `jest.fn()` — see **CRITICAL RULE 2** for configuration rules
 - **Mock behavior configuration**: Configured per scenario in `beforeEach` blocks
 - **Mock naming**: Consistent mock prefix (e.g., `mockLogger`, `mockGetMessageBroker`)
 - **Selective mocking**: Mock only the methods being used unless it significantly raises complexity
 - **Mock verification**: Always verify mock calls when behavior depends on them
 - **Mock realism**: Ensure mocks behave like real implementations (same async patterns, error types)
 - **Mock methods**:
-    - Use `mockImplementation` when invoking callbacks passed to the mocked function — see **CRITICAL RULE #3**
+    - Use `mockImplementation` when invoking callbacks passed to the mocked function — see **CRITICAL RULE 3**
     - Use `mockResolvedValue` for async returns
     - Use `mockReturnValue` for sync returns
     - Prefer "Once" versions when appropriate (`mockResolvedValueOnce`, etc.)

package/src/templates/.claude/agents/test-hardener.md CHANGED Viewed

@@ -11,6 +11,27 @@ Make this code bulletproof by writing the tests the developer didn't think of. A
 You have NO knowledge of how this code was written. You are seeing it for the first time. You are NOT rewriting the developer's tests — you're adding what's missing.
+## Tooling — Non-Negotiable
+**Use ONLY the repo's established tooling to run and verify tests.** You must discover the repo's conventions before writing or running anything.
+### Discovery (do this FIRST, before writing any tests)
+1. Read `package.json` (root and any relevant workspace package). Identify the test script — it will be one of `npm test`, `pnpm test`, `yarn test`, or similar. That is your test runner. Period.
+2. If the repo is a monorepo, identify the workspace tool (`pnpm --filter`, `npm -w`, `yarn workspace`, `turbo run`, `nx run`, etc.) and use that to scope test runs to the relevant package.
+3. Read the test framework config (e.g., `jest.config.ts`, `vitest.config.ts`, `.mocharc.*`) to understand module resolution, transforms, and path aliases.
+4. Read `.ai/UnitTestGeneration.md` and `.ai/UnitTestExamples.md` if they exist. These are your style guide. Follow them exactly.
+### The Rules
+- **Run tests with the repo's test script.** `npm test`, `pnpm test`, `pnpm --filter <pkg> test`, etc. Whatever `package.json` says.
+- **DO NOT use `node -e`, `npx tsx`, `npx jest`, `ts-node`, or any ad-hoc command to run, compile, or verify code.** Ever. No exceptions. The repo has a test runner. Use it.
+- **DO NOT improvise test runners or verification methods.** If you're tempted to run something outside the repo's scripts to "quickly check" something, stop. Use the test script.
+- **DO NOT install packages, add dependencies, or modify package.json.** You write tests using what's already available.
+- **When running tests, scope them.** Don't run the entire test suite when you only need to verify one file. Use the test runner's built-in filtering (e.g., `pnpm test -- --testPathPattern=path/to/file` for Jest, or equivalent).
+If you catch yourself about to type `node -e` or `npx tsx` or anything that isn't the repo's test script, you are doing it wrong. Back away from the keyboard.
 ## Input
 1. Read the build plan from `docs/build-plans/` to understand intended behavior.
@@ -97,12 +118,14 @@ Map every branch (`if`/`else`, `try`/`catch`, `switch`, early returns) in the so
 ## Process
-1. Audit existing tests. Catalog what's covered.
-2. Identify gaps by category.
-3. Prioritize: likely to happen OR catastrophic if it does.
-4. **Before writing any test**, verify it against the anti-patterns above. For every planned `describe` block, identify the specific source line/branch it uniquely covers. If you cannot, drop it.
-5. Write tests. Follow the existing test framework and patterns exactly.
-6. Write your report.
+1. **Discover repo tooling.** Follow the Tooling — Non-Negotiable section above. Identify the package manager, test script, test framework config, and any workspace/monorepo conventions. Do this BEFORE reading any source code.
+2. Audit existing tests. Catalog what's covered.
+3. Identify gaps by category.
+4. Prioritize: likely to happen OR catastrophic if it does.
+5. **Before writing any test**, verify it against the anti-patterns above. For every planned `describe` block, identify the specific source line/branch it uniquely covers. If you cannot, drop it.
+6. Write tests. Follow the existing test framework and patterns exactly.
+7. Run tests using the repo's test script. Fix any failures before proceeding.
+8. Write your report.
 ## Output
@@ -146,9 +169,10 @@ Actual bugs discovered during test hardening.
 ## Verification
-Before writing the report, verify:
+Before writing the report, verify by **reading your own code and the source code** — not by running ad-hoc commands:
-1. Every new `describe` block covers a code path no existing test covers.
+1. Every new `describe` block covers a code path no existing test covers. Verify this by reading the source, not by running coverage tools.
 2. No tests target `.tsx` files or barrel exports.
 3. Tests are added to existing test files, not new ones.
 4. No existing tests were modified.
+5. All tests pass when run with the repo's test script. This is the ONLY command you should have executed via Bash during this entire process.

package/src/templates/.claude/commands/prep-pr.md CHANGED Viewed

@@ -2,15 +2,15 @@
 ## Goal
-Prepare and create a draft pull request for the current branch. Run pre-submission checks, generate a PR title and description, collect testing steps, and create the draft PR.
+Prepare and create a draft pull request for the current branch. Run pre-submission checks, generate a full PR review, collect optional author testing notes, and create the draft PR.
 This command takes no arguments. It operates on the currently checked-out branch.
-<!-- Sibling command: review-pr.md uses the same plugin discovery/loading flow but with deeper evaluation. If the plugin format changes, update both files. -->
+<!-- Sibling command: review-pr.md uses the same analysis approach and plugin discovery/loading flow. If the analysis format or plugin format changes, update both files. -->
 ## Constraints
-- This command is conversational. There are multiple points where you pause and wait for developer input (Step 2, Step 6, Step 7, Step 8). Don't try to rush through without their responses.
+- This command is conversational. There are multiple points where you pause and wait for developer input (Step 2, Step 10, Step 11, Step 12). Don't try to rush through without their responses.
 - Check failures are informational, not blocking. The developer can still create the PR even if checks fail.
 - Always create the PR as a draft. No option to toggle this.
 - If the developer wants to bail at any point, respect that. Don't push them to continue.
@@ -67,34 +67,140 @@ The developer can accept the default or specify another branch. Use whatever the
 ## Step 3: Gather Diff
-Check that there are commits ahead of the target branch:
+Fetch the latest state of the target branch from the remote so comparisons reflect what's actually on the remote, not a potentially stale local copy:
 ```bash
-git log --oneline $TARGET..HEAD
+git fetch origin $TARGET
+```
+### 3.1: Resolve Comparison Ref
+Default to `origin/$TARGET` (the remote tracking ref) for all diff and log comparisons. Before proceeding, check whether a local copy of the target branch exists and is ahead of the remote:
+```bash
+git rev-list --count origin/$TARGET..$TARGET 2>/dev/null
+```
+- **If the command fails** (no local branch exists), or **returns 0** (local is behind or even with remote): use `origin/$TARGET` silently. No prompt needed.
+- **If the count is greater than 0**: the local branch has commits not yet on the remote. Ask the developer:
+> **Your local `{$TARGET}` is {count} commit(s) ahead of `origin/{$TARGET}`.** Use local or remote for comparison? (default: remote)
+Use whichever ref the developer chooses as `$COMPARE_REF` for all subsequent diff and log commands. If they accept the default or don't respond, use `origin/$TARGET`.
+### 3.2: Check for Commits
+Check that there are commits ahead of the comparison ref:
+```bash
+git log --oneline $COMPARE_REF..HEAD
 ```
 **If no commits are returned**, stop and output:
-> **No commits ahead of `{$TARGET}`. Nothing to PR.** You may need to rebase.
+> **No commits ahead of `{$COMPARE_REF}`. Nothing to PR.** You may need to rebase.
-Gather the change data:
+### 3.3: Gather Change Data
 ```bash
 # File list with change stats
-git diff --stat $TARGET...HEAD
+git diff --stat $COMPARE_REF...HEAD
 # Full diff
-git diff $TARGET...HEAD
+git diff $COMPARE_REF...HEAD
 # Commit log (excluding merges)
-git log --no-merges --oneline $TARGET..HEAD
+git log --no-merges --oneline $COMPARE_REF..HEAD
 ```
-## Step 4: Review Plugin Checks
+## Step 4: Categorize and Filter Files
+Before analysis, categorize the changed files:
+**Noise files (acknowledge but don't analyze deeply):**
+- `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml` -> "lock file updated"
+- `*.min.js`, `*.min.css`, `dist/*`, `build/*` -> "generated/bundled files"
+- Binary files (images, fonts, etc.) -> "binary file added/modified/deleted"
+**Categorize by area:**
+Group changed files by top-level directory. If the repo uses workspaces (check for a `workspaces` field in `package.json` or the presence of `pnpm-workspace.yaml`), use the workspace definitions to inform grouping. Otherwise, fall back to top-level directory names.
+## Step 5: Analyze Changes
+### For Small PRs (fewer than 30 changed files):
+Analyze **per-commit**. For each commit, produce a narrative block with:
+1. **The Mental Model Shift** -- plain English explanation of how the codebase's story changed. Focus on "why" and "so what," not restating the diff. Highlight:
+    - Architectural shifts or new patterns introduced
+    - Patterns retired or approaches abandoned
+    - Code that appears orphaned, half-finished, or disconnected
+2. **What Changed Structurally** -- numbered list of concrete changes:
+    - Collapse repetitive/mechanical changes (e.g., "~15 files updated imports from X to Y")
+    - Call out meaningful changes individually with enough context to understand impact
+### For Large PRs (30 or more changed files):
+Analyze **by theme/area** rather than per-commit. Group changes by package or functional area:
-Review plugin checks are loaded dynamically from `.ai/review-checks/`. Each check file is a markdown file with YAML frontmatter.
+1. First, identify the major themes/areas touched
+2. For each theme, produce a narrative block (Mental Model Shift + Structural Changes)
+3. After per-area analysis, produce a **holistic summary** that captures cross-cutting changes and overall architectural impact
-<!-- This is the same discovery/loading flow as review-pr.md, but with lighter evaluation: pass/fail with brief file/line pointers rather than full reviewer narrative. -->
+### Analysis Guidance:
+- **Explain the "why" and "so what"**, not just the "what"
+- **Collapse noise**: If 50 files have the same mechanical change, that's one bullet point
+- **Highlight signal**: New public APIs, changed interfaces, deleted capabilities, behavioral changes
+- **Flag disconnects**: Code that doesn't seem to connect to anything, half-implemented features, TODOs left behind
+- **Note removals**: Deleted code is as important as added code -- what capability is gone?
+## Step 6: Identify Risks
+Scan for and report:
+**Security concerns:**
+- Auth/authorization changes
+- New API endpoints or route changes
+- Injection vectors (SQL, command, XSS)
+- Sensitive data in logs or error messages
+- Secrets or credentials (even if they look like placeholders)
+- Changes to validation or sanitization
+**Behavioral risks:**
+- Changes to public APIs or interfaces
+- Modified default values or fallback behavior
+- Error handling changes that might swallow errors
+- Timing or ordering changes in async code
+**Architectural concerns:**
+- Layer boundary violations (e.g., handler calling repository directly)
+- New circular dependencies
+- Assumptions in one layer that depend on implementation details of another
+- Patterns that diverge from established codebase conventions
+**Dependency concerns:**
+- New dependencies added
+- Dependencies removed (what relied on them?)
+- Major version bumps
+- Dependencies with known security issues
+**Before finalizing risk callouts related to test files (`.test.ts`, `.spec.ts`):**
+Read `.ai/UnitTestGeneration.md` (if it exists) and cross-reference any test-related findings against the project's testing conventions. Do NOT flag patterns that conform to those guidelines -- they are intentional, not risks.
+## Step 7: Tribal Knowledge Checks
+Tribal knowledge checks are loaded dynamically from `.ai/review-checks/`. Each check file is a markdown file with YAML frontmatter.
+<!-- This is the same discovery/loading flow as review-pr.md. -->
 **Expected check file format:**
@@ -116,7 +222,7 @@ applies_when: Changed files include .ts files in src/
 ls .ai/review-checks/*.md 2>/dev/null
 ```
-2. **If the directory does not exist or contains no `.md` files**, skip the entire review plugin checks section silently — no error, no placeholder text. Proceed to Step 5.
+2. **If the directory does not exist or contains no `.md` files**, skip the entire Tribal Knowledge Checks section silently -- no error, no placeholder text. Proceed to Step 8.
 3. **If files are found**, read each one:
@@ -126,34 +232,60 @@ cat .ai/review-checks/*.md
 4. For each file, parse the YAML frontmatter to extract `name` and `applies_when`. If a file is missing frontmatter or has invalid/unparseable YAML, skip it and note: "Skipped `{filename}`: missing or invalid frontmatter."
-5. Evaluate each file's `applies_when` value against the list of changed files from the diff (gathered in Step 3). Use your judgment — `applies_when` is natural language, not a glob pattern. Match generously but sensibly.
+5. Evaluate each file's `applies_when` value against the list of changed files from the diff (gathered in Step 3). Use your judgment -- `applies_when` is natural language, not a glob pattern. Match generously but sensibly.
-6. For each check group where `applies_when` matches, evaluate each check item against the diff. Output format: **pass/fail per check with brief file/line pointers for failures** — enough breadcrumb to find and fix the issue, not a full review narrative.
+6. For each check group where `applies_when` matches, include its checks in the output under a heading using the `name` from frontmatter. Evaluate each check against the actual diff.
-7. If files exist but **none** of their `applies_when` criteria match the diff, skip the check results silently.
-Store the check results — they'll be included in the PR body later.
+7. If files exist but **none** of their `applies_when` criteria match the diff, skip the Tribal Knowledge Checks section entirely.
 Display the check results to the developer as you go so they can see what passed and what needs attention.
-## Step 5: Generate Title and Description
+## Step 8: Testing Recommendations
+Based on the changes, provide concrete testing recommendations. This is NOT generic "write more tests" advice -- recommendations must be tied to the actual changes.
+### What to recommend:
+**Manual verification:**
+- Specific user flows or scenarios the reviewer should manually test
+- Edge cases introduced by the changes that aren't obvious from reading the code
+- Integration points that might behave differently after these changes
+**Automated test coverage:**
+- New code paths that lack corresponding tests
+- Behavioral changes that existing tests might not cover
+- Specific test scenarios to add (with enough detail to write the test)
+- **Do NOT recommend tests for React components (`.tsx` files).** We do not unit test React components.
+**Regression risks:**
+- Existing functionality that might be affected and should be regression tested
+- Areas where the change assumptions might conflict with existing behavior
+### Guidance:
+- Be specific: "Test the login flow with an expired token" not "test authentication"
+- Reference the actual changes: "The new `validateApiKey` middleware should be tested with missing, invalid, and expired keys"
+- Prioritize: If there are many potential tests, highlight the most important ones first
+- If the PR includes good test coverage already, acknowledge that and note any gaps
+## Step 9: Generate Title
 Analyze the diff and commit history gathered in Step 3. Generate:
 - **Title**: Concise, reflects the nature of the changes (bug fix, feature, refactor, etc.). Keep it under 70 characters.
-- **Description**: Summarize what changed and why. Focus on the "so what" — not a restatement of the diff. Draw from commit messages and the actual code changes.
-For large diffs, prioritize analyzing `git diff --stat` and the commit log, then selectively read diffs for the most significant files rather than trying to process the entire diff.
-## Step 6: Collect Testing Steps
+## Step 10: Collect Author Testing Notes (Optional)
 Ask the developer:
-> **Testing steps — how should a reviewer verify these changes?**
+> **Do you have any additional testing steps you'd like to include?** These will appear in the PR as "PR Author Testing Recommendations" alongside the generated testing analysis. Feel free to skip if the generated recommendations cover it.
-The developer provides their own testing steps as free-form text. These are NOT AI-generated. Wait for their input before proceeding.
+If the developer provides testing steps, store them for inclusion in the PR body. If they skip or decline, proceed without them.
-## Step 7: Present Full Preview
+## Step 11: Present Full Preview
 Show the developer the complete PR preview:
@@ -164,24 +296,59 @@ Show the developer the complete PR preview:
 ---
+# {title}
+**Branch**: {current branch} -> {$TARGET}
+**Changes**: {additions} additions, {deletions} deletions across {changed file count} files
+---
 ## Summary
-{generated description}
+[Per-commit or per-theme narrative blocks from Step 5]
-## Test Plan
+---
-{developer-provided testing steps}
+## Risk Callouts
+[Risks from Step 6, or "No significant risks identified"]
+---
+## Tribal Knowledge Checks
+[Only if matching check files were found in Step 7. Omit section entirely otherwise.]
+### [name from check file frontmatter]
+- [x] [Check passed or N/A]
+- [ ] **[Check failed]**: [Specific finding with file:line references]
+---
+## Testing Recommendations
+### Manual Verification
+- [Specific scenario to test manually]
+### Automated Test Gaps
+- [Specific test that should be written]
+### Regression Risks
+- [Existing functionality to regression test]
 ```
-If check results exist from Step 4, also show:
+If the developer provided author testing notes in Step 10, also show:
 ```
-<details>
-<summary>Pre-submission Checks</summary>
+---
-{check results — pass/fail with names}
+## PR Author Testing Recommendations
-</details>
+{developer-provided testing steps}
 ```
 Then ask:
@@ -190,7 +357,7 @@ Then ask:
 Let the developer make edits. Iterate until they're satisfied.
-## Step 8: Push Branch
+## Step 12: Push Branch
 Check whether the branch has an upstream and is up to date:
@@ -218,32 +385,60 @@ git push -u origin $(git branch --show-current)
 **If the branch is already up to date with the remote**, skip this step silently.
-## Step 9: Create Draft PR
+## Step 13: Create Draft PR
-Assemble the PR body:
+Assemble the PR body using the full review content from Steps 5-8:
 ```
+# {title}
+**Branch**: {current branch} -> {$TARGET}
+**Changes**: {additions} additions, {deletions} deletions across {changed file count} files
+---
 ## Summary
-{description}
+{per-commit or per-theme narrative blocks}
-## Test Plan
+---
+## Risk Callouts
-{testing steps}
+{risks, or "No significant risks identified"}
 ```
-If check results exist from Step 4, append:
+If tribal knowledge check results exist from Step 7, append:
 ```
-<details>
-<summary>Pre-submission Checks</summary>
+---
-{check results — pass/fail with names}
+## Tribal Knowledge Checks
-</details>
+{check results with pass/fail and file:line references}
 ```
-If no check results (no plugin files found or none matched), omit the `<details>` section entirely.
+If no check results (no plugin files found or none matched), omit the Tribal Knowledge Checks section entirely.
+Append the testing recommendations from Step 8:
+```
+---
+## Testing Recommendations
+{testing recommendations}
+```
+If the developer provided author testing notes in Step 10, append:
+```
+---
+## PR Author Testing Recommendations
+{developer-provided testing steps}
+```
 Create the draft PR:

package/src/templates/.claude/commands/review-pr.md CHANGED Viewed

@@ -50,19 +50,52 @@ gh pr view --json number,title,baseRefName,headRefName,body,additions,deletions,
 ## Step 3: Gather Diff Information
-Use `gh pr diff` to get the diff as GitHub sees it. This avoids stale-local-branch problems where a behind-origin base branch inflates the diff with already-merged changes from other PRs.
+### 3.1: Resolve Comparison Ref
+Fetch the latest state of the base branch from the remote:
+```bash
+git fetch origin $baseRefName
+```
+Default to using the remote for comparison (via `gh pr diff`). Before proceeding, check whether the local copy of the base branch is ahead of the remote:
 ```bash
-# File list with change stats
-gh pr diff --stat
+git rev-list --count origin/$baseRefName..$baseRefName 2>/dev/null
+```
+- **If the command fails** (no local branch exists), or **returns 0** (local is behind or even with remote): use the remote. No prompt needed.
+- **If the count is greater than 0**: the local branch has commits not yet on the remote. Ask the developer:
+> **Your local `{baseRefName}` is {count} commit(s) ahead of `origin/{baseRefName}`.** Use local or remote for comparison? (default: remote)
+Store the choice as `$DIFF_MODE` — either `remote` (default) or `local`.
+### 3.2: Get the Diff
+**If `$DIFF_MODE` is `remote`** (the 95% case):
-# Full diff (for analysis)
+Use `gh pr diff` to get the full diff as GitHub sees it. This avoids stale-local-branch problems where a behind-origin base branch inflates the diff with already-merged changes from other PRs.
+```bash
 gh pr diff
 ```
-**For commit messages**, use the `commits` array already captured in Step 2 — that is the authoritative commit list for this PR. Do NOT use `git log`, which can include commits from other PRs if the local base branch is behind origin.
+**If `$DIFF_MODE` is `local`:**
-**Cross-check**: Compare the file count from `gh pr diff --stat` against the `changedFiles` value from Step 2. If they diverge significantly, flag the discrepancy in your output and prefer the `gh pr view` / `gh pr diff` data as the source of truth.
+Use local git to diff against the local base branch:
+```bash
+git diff $baseRefName...HEAD
+```
+⚠️ Note: local diffs may differ slightly from GitHub's view in repos with complex merge histories.
+### 3.3: Stats and Commit Messages
+File-level stats (additions, deletions, file count) are already available from the `gh pr view --json` output captured in Step 2 — don't duplicate that work here.
+**For commit messages**, use the `commits` array already captured in Step 2 — that is the authoritative commit list for this PR. Do NOT use `git log`, which can include commits from other PRs if the local base branch is behind origin.
 ## Step 4: Categorize and Filter Files