slash-do 1.9.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,11 +1,16 @@
1
1
  ---
2
2
  description: Automated audit/triage of PLAN.md — archive completed items to DONE.md, suggest new work, keep PLAN.md lean
3
+ argument-hint: "[--interactive]"
3
4
  ---
4
5
 
5
6
  # Replan Command
6
7
 
7
8
  Automatically audit PLAN.md against the codebase, prune completed/stale items, archive what's done, suggest new work, and leave PLAN.md lean and actionable.
8
9
 
10
+ **Default mode: fully autonomous.** Scans the codebase, archives done items, removes stale items, adds suggested new items, and commits — no user interaction.
11
+
12
+ **`--interactive` mode:** Pauses after evidence gathering to present findings and get user approval before making changes.
13
+
9
14
  **Philosophy:** PLAN.md should be short enough to paste into a prompt. Completed items belong in a done log, not cluttering the active plan.
10
15
 
11
16
  ## Boundary Rule: PLAN.md vs GOALS.md
@@ -22,7 +27,7 @@ GOALS.md answers: *Why does this project exist? What does success look like? Wha
22
27
 
23
28
  ## Phase 1: Automated Evidence Gathering
24
29
 
25
- Launch these agents in parallel — no user interaction needed yet.
30
+ Launch these agents in parallel — no user interaction needed.
26
31
 
27
32
  **Agent 1: Git History Analysis**
28
33
  - `git log --oneline -50` — identify commits that completed plan items
@@ -41,31 +46,51 @@ Launch these agents in parallel — no user interaction needed yet.
41
46
  - Check for outdated dependencies (`npm outdated`, `cargo outdated`, etc. as appropriate)
42
47
  - Review GOALS.md (if it exists) for strategic goals not yet represented in the plan
43
48
  - Identify code quality opportunities (large files, complex functions, missing error handling)
44
- - Formulate 1-3 suggested new items to propose to the user
49
+ - Formulate 1-3 suggested new items
45
50
 
46
51
  **Agent 4: GOALS.md Boundary Check**
47
52
  If `GOALS.md` exists:
48
53
  - Check for checkbox task lists or implementation details that leaked in
49
54
  - Note any items that should be absorbed into PLAN.md
50
55
 
51
- ## Phase 2: Auto-Triage (No User Input)
56
+ ## Phase 2: Auto-Triage
52
57
 
53
- Using agent results, automatically classify every PLAN.md item:
58
+ Using agent results, classify every PLAN.md item:
54
59
 
55
60
  | Status | Criteria | Action |
56
61
  |--------|----------|--------|
57
- | `confirmed-done` | Git commit + code exists + tests pass | Move to DONE.md |
58
- | `likely-done` | Strong evidence but not 100% certain | Present to user for confirmation |
59
- | `stale` | No commits, no code, no recent discussion; item is >30 days old with zero progress | Flag for removal |
62
+ | `confirmed-done` | Git commit + code exists + tests pass | Archive to DONE.md |
63
+ | `likely-done` | Strong evidence but not 100% certain | Archive to DONE.md |
64
+ | `stale` | No commits, no code, no recent discussion; item is >30 days old with zero progress | Remove from PLAN.md |
60
65
  | `still-pending` | No evidence of completion | Keep in PLAN.md |
61
66
 
62
- ## Phase 3: Single Interactive Checkpoint
67
+ ## Phase 3: Apply Changes (or Checkpoint if Interactive)
68
+
69
+ ### Default Mode (autonomous)
70
+
71
+ Apply all changes immediately without prompting:
72
+
73
+ 1. Archive `confirmed-done` and `likely-done` items to DONE.md
74
+ 2. Remove `stale` items from PLAN.md
75
+ 3. Add suggested new items to the appropriate PLAN.md section
76
+ 4. Absorb any tactical items found in GOALS.md
77
+ 5. Print a brief summary of what was done:
78
+
79
+ ```
80
+ Replan complete:
81
+ - Archived {N} completed items to DONE.md
82
+ - Removed {S} stale items
83
+ - Added {P} new suggested items
84
+ - {any GOALS.md boundary fixes}
85
+ ```
86
+
87
+ ### Interactive Mode (`--interactive`)
63
88
 
64
- Present ONE consolidated summary to the user. Keep it tight:
89
+ Present ONE consolidated summary to the user:
65
90
 
66
91
  ```
67
92
  AskUserQuestion([{
68
- question: "Replan audit complete. Here's what I found:\n\n**Auto-archiving to DONE.md** ({N} items):\n{list of confirmed-done items}\n\n**Likely done — confirm?** ({M} items):\n{list with evidence}\n\n**Flagged as stale** ({S} items):\n{list with last-activity dates}\n\n**New suggestions** ({P} items):\n{numbered list of proposed new items with rationale}\n\nHow should I proceed?",
93
+ question: "Replan audit complete. Here's what I found:\n\n**Auto-archiving to DONE.md** ({N} items):\n{list of confirmed-done items}\n\n**Likely done — archive?** ({M} items):\n{list with evidence}\n\n**Flagged as stale** ({S} items):\n{list with last-activity dates}\n\n**New suggestions** ({P} items):\n{numbered list of proposed new items with rationale}\n\nHow should I proceed?",
69
94
  multiSelect: true,
70
95
  options: [
71
96
  { label: "Archive confirmed-done", description: "Move {N} confirmed items to DONE.md" },
@@ -73,7 +73,7 @@ With the flow understood, evaluate the changed code against these principles:
73
73
  - Function and variable names should communicate intent. If you need to read the implementation to understand what a name means, it's poorly named.
74
74
  - Boolean variables/params should read as predicates (`isReady`, `hasAccess`), not ambiguous nouns.
75
75
 
76
- Only flag principle violations that are **concrete and actionable** in the changed code. Do not flag pre-existing design issues in untouched code unless the changes make them worse.
76
+ For this review, only flag principle and design violations that are **concrete and actionable** in the code changed by this PR. However, if you discover a clear, real bug or correctness issue — even in code not directly modified here call it out and help ensure it gets fixed (in this PR or a follow-up). Never dismiss serious problems as "out of scope" or "not modified in this PR."
77
77
 
78
78
  </review_instructions>
79
79
 
@@ -98,6 +98,27 @@ Check every file against this checklist. The checklist is organized into tiers
98
98
  - If the PR adds a new call to an external service that has established mock/test infrastructure (mock mode flags, test helpers, dev stubs), verify the new call uses the same patterns — bypassing them makes the new code path untestable in offline/dev environments and inconsistent with existing integrations
99
99
  - If the PR adds a new UI component or client-side consumer against an existing API endpoint, read the actual endpoint handler or response shape — verify every field name, nesting level, identifier property, and response envelope path used in the consumer matches what the producer returns. This is the #1 source of "renders empty" bugs in new views built against existing APIs
100
100
 
101
+ **Push/real-time event scoping**
102
+ - If the PR adds or modifies WebSocket, SSE, or pub/sub event emission, trace the event scope: does the event reach only the originating session/user, or is it broadcast to all connected clients? Check payloads for sensitive content (user inputs, images, tokens) that should not leak across sessions. If the consumer filters by a correlation ID, verify the producer includes one and that the ID is generated server-side or validated against the session
103
+
104
+ **Cleanup/teardown side effect audit**
105
+ - If the PR adds cleanup, teardown, or garbage-collection functions, trace whether the cleanup performs implicit state mutations (auto-merge into main, auto-commit of unreviewed changes, cascade writes to shared state). Verify the cleanup aborts safely if a prerequisite step fails (e.g., saving dirty state before deletion) rather than proceeding with data loss
106
+
107
+ **Specification/standard conformance**
108
+ - If the PR implements or extends a parser for a well-known format (cron expressions, date formats, URLs, semver, MIME types), verify boundary handling matches the specification — especially field-specific ranges (month starts at 1, not 0), normalization conventions (cron DOW 0 and 7 both mean Sunday), and step/range semantics that differ per field type
109
+
110
+ **Temporal context consistency**
111
+ - If the PR adds timezone-aware logic alongside existing non-timezone-aware comparisons in the same code flow (e.g., a weekday gate using UTC while cron matching uses user timezone), check that all temporal comparisons in the flow use the same timezone context — mixed contexts cause operations to trigger on the wrong local day/hour
112
+
113
+ **Status/health endpoint freshness**
114
+ - If the PR adds or modifies a status or health-check endpoint, trace whether it returns live probe results or cached data. Cached health checks mask real-time failures — a cache keyed by URL that survives URL reconfiguration reports stale status. Verify health endpoints bypass caches or use sufficiently short TTLs
115
+
116
+ **Boolean/type fidelity through serialization boundaries**
117
+ - If the PR persists boolean flags to text-based storage (markdown metadata, flat files, query strings, form data), trace the round-trip: write path → storage format → read/parse path → consumption site. Boolean `false` serialized as the string `"false"` is truthy in JavaScript — verify all consumption sites use strict equality or a dedicated coercion function, and that the same coercion is applied consistently
118
+
119
+ **Cross-layer invariant enforcement**
120
+ - If the PR introduces or modifies an invariant relationship between configuration flags (e.g., "flag A implies flag B"), trace enforcement through every layer: UI toggle handlers, form submission payloads, API validation schemas, server default-application functions, and persistence round-trip. If any layer allows the invariant to be violated, cascading defaults produce contradictory state
121
+
101
122
  **Error path completeness**
102
123
  - Trace each error path end-to-end: does the error reach the user with a helpful message and correct HTTP status? Or does it get swallowed, logged silently, or surface as a generic 500?
103
124
  - For multi-step operations (sync to N repos, batch updates): are per-item failures tracked separately from overall success? Does the status reflect partial failure accurately?
@@ -154,7 +175,7 @@ Check every file against this checklist. The checklist is organized into tiers
154
175
  - If the PR adds or reorders sequential steps/instructions, verify the ordering matches execution dependencies — readers following steps in order must not perform an action before its prerequisite
155
176
 
156
177
  **Transactional write integrity**
157
- - If the PR performs multi-item writes (database transactions, batch operations), verify each write includes condition expressions that prevent stale-read races (TOCTOU) — an unconditioned write after a read can upsert deleted records, double-count aggregates, or drive counters negative. Trace the gap between read and write for each operation
178
+ - If the PR performs multi-item writes (database transactions, batch operations), verify each write includes condition expressions that prevent stale-read races (TOCTOU) — an unconditioned write after a read can upsert deleted records, double-count aggregates, or drive counters negative. Trace the gap between read and write for each operation. Also verify that update/modify operations won't silently create records when the target key doesn't exist — database update operations often have implicit upsert semantics (e.g., DynamoDB UpdateItem, MongoDB update with upsert) that create partial records for invalid IDs; add existence condition expressions when the operation should only modify existing records
158
179
  - If the PR catches transaction/conditional failures, verify the error is translated to a client-appropriate status (409, 404) rather than bubbling as 500 — expected concurrency failures are not server errors
159
180
 
160
181
  **Batch/paginated API consumption**
@@ -192,6 +213,10 @@ Check every file against this checklist. The checklist is organized into tiers
192
213
  **Abstraction layer fidelity**
193
214
  - If the PR calls a third-party API through an internal wrapper/abstraction layer, trace whether the wrapper requests and forwards all fields the handler depends on — third-party APIs often have optional response attributes that require explicit opt-in (e.g., cancellation reasons, extended metadata). Code branching on fields the wrapper doesn't forward will silently receive `undefined` and take the wrong path. Also verify that test mocks match what the real wrapper returns, not what the underlying API could theoretically return
194
215
  - If the PR passes multiple parameters through a wrapper/abstraction layer to an underlying API, check whether any parameter combinations are mutually exclusive in the underlying API (e.g., projection expressions + count-only select modes) — the wrapper should strip conflicting parameters rather than forwarding all unconditionally, which causes validation errors at the underlying layer
216
+ - If the PR calls framework or library functions with discriminated input formats (e.g., content paths vs script paths, different loader functions per format), trace each call site to verify the function variant used actually handles the input format being passed — especially fallback/default branches in multi-format dispatchers, where the fallback commonly uses the wrong function. Also verify positional argument order matches the called function's parameter order (not assumed from variable names) and that the object type passed matches what the API expects (e.g., asset object vs class reference, property access vs method call)
217
+
218
+ **Parameter consumption tracing**
219
+ - If the PR adds a function with validated input parameters (schema validation, input decorators, type annotations), trace each validated parameter through to where it's actually consumed in the implementation. Parameters that pass validation but are never read create dead API surface — callers believe they're configuring behavior that's silently ignored. Either wire the parameter through or remove it from the public API
195
220
 
196
221
  **Summary/aggregation endpoint consistency**
197
222
  - If the PR adds a summary or dashboard endpoint that aggregates counts/previews across multiple data sources, trace each category's computation logic against the corresponding detail view it links to — verify they apply the same filters (e.g., orphan exclusion, status filtering), the same ordering guarantees (sort keys that actually exist on the queried index), and that navigation links propagate the aggregated context (e.g., `?status=pending`) so the destination page matches what the summary promised
@@ -219,6 +244,9 @@ Check every file against this checklist. The checklist is organized into tiers
219
244
  **Bulk vs single-item operation parity**
220
245
  - If the PR modifies a single-item CRUD operation (create, update, delete) to handle new fields or apply new logic, trace the corresponding bulk/batch operation for the same entity — it often has its own independent implementation that won't pick up the change. Verify both paths handle the same fields, apply the same validation, and preserve the same secondary data
221
246
 
247
+ **Bulk operation selection lifecycle**
248
+ - If the PR adds operations that act on a user-selected subset of items (bulk actions, batch operations), trace the complete lifecycle of the selection state: when is it cleared (data refresh, item deletion), when is it not cleared but should be (filter/sort/page changes), and whether the operation re-validates the selection at execution time (especially after confirmation dialogs where the underlying data may change between display and confirmation)
249
+
222
250
  **Config value provenance for auto-upgrade**
223
251
  - If the PR adds auto-upgrade logic that replaces config values with newer defaults (prompt versions, schema migrations, template updates), verify the code can distinguish "user customized this value" from "this is the previous default." Without provenance tracking (version stamps, customization flags, or comparison against known previous defaults), auto-upgrade will overwrite intentional user customizations or skip legitimate upgrades
224
252
 
@@ -1,7 +1,12 @@
1
1
  ---
2
2
  description: Resolve PR review feedback with parallel agents
3
+ argument-hint: "[--interactive]"
3
4
  ---
4
5
 
6
+ **Default mode: fully autonomous.** Fetches review feedback, fixes issues, pushes, resolves threads, and loops Copilot reviews without prompting. Auto-skips on timeout/errors after retries.
7
+
8
+ **`--interactive` mode:** Pauses on Copilot review timeout and repeated errors to ask the user how to proceed.
9
+
5
10
  # Resolve PR Review Feedback
6
11
 
7
12
  Address the latest code review feedback on the current branch's pull request using parallel sub-agents.
@@ -18,6 +23,8 @@ Address the latest code review feedback on the current branch's pull request usi
18
23
  ```
19
24
  Save results to `/tmp/pr_threads.json` for parsing.
20
25
 
26
+ **Thread-count tracking**: Count and report total unresolved threads upfront (e.g., "Found 7 unresolved review threads"). After resolution, report how many were addressed vs. remaining (e.g., "Resolved 5/7 threads, 2 left unaddressed"). This prevents partial sessions from going unnoticed across context resets.
27
+
21
28
  4. **Spawn parallel sub-agents to address feedback**:
22
29
  - For small PRs (1-3 unresolved threads), handle fixes inline instead of spawning agents
23
30
  - For larger PRs, spawn one `Agent` call (general-purpose type) per review thread (or group closely related threads on the same file into one agent)
@@ -40,14 +47,16 @@ Address the latest code review feedback on the current branch's pull request usi
40
47
  - Stage all changed files and commit with a descriptive message summarizing what was addressed. Do not include co-author info.
41
48
  - Push to the branch.
42
49
 
43
- 8. **Resolve conversations**: For each addressed thread, resolve it via GraphQL mutation using stdin JSON. **Never use `$variables` in the query — inline the thread ID directly**:
50
+ 8. **Resolve conversations**: For each addressed thread, resolve it via GraphQL mutation using stdin JSON. Track resolution count against the total from step 3. **Never use `$variables` in the query — inline the thread ID directly**:
44
51
  ```bash
45
52
  echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"THREAD_ID_HERE\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
46
53
  ```
47
54
 
48
55
  9. **Request another Copilot review** (only if `is_fork_pr=false`): After pushing fixes, request a fresh Copilot code review and repeat from step 3 until the review passes clean. **Skip for fork-to-upstream PRs.**
49
56
 
50
- 10. **Report summary**: Print a table of all threads addressed with file, line, and a brief description of the fix.
57
+ **Repeated-comment dedup**: When fetching threads after a new Copilot review round, compare each new unresolved thread's comment body and file/line against threads from the previous round that were intentionally left unresolved (replied to as non-issues or disagreements). If all new unresolved threads are repeats of previously-dismissed feedback, treat the review as clean (no new actionable comments) and exit the loop.
58
+
59
+ 10. **Report summary**: Print a table of all threads addressed with file, line, and a brief description of the fix. Include a final count line: "Resolved X/Y threads." If any threads remain unresolved, list them with reasons (unclear feedback, disagreement, requires user input).
51
60
 
52
61
  !`cat ~/.claude/lib/graphql-escaping.md`
53
62
 
@@ -71,14 +80,15 @@ Poll using GraphQL to check for a new review with a `submittedAt` timestamp afte
71
80
  gh api graphql -f query='{ repository(owner: "OWNER", name: "REPO") { pullRequest(number: PR_NUM) { reviews(last: 3) { nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }'
72
81
  ```
73
82
 
74
- **Dynamic poll timing**: Before your first poll, check how long the most recent Copilot review on this PR took by comparing consecutive Copilot review `submittedAt` timestamps (or PR creation time for the first review). Use that duration as your expected wait. If no prior review exists, default to 5 minutes. Set poll interval to 60 seconds and max wait to **2x the expected duration** (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take **10-15 minutes** for large diffs — do NOT give up early.
83
+ **Dynamic poll timing**: Before your first poll, check how long the most recent Copilot review on this PR took by comparing consecutive Copilot review `submittedAt` timestamps (or PR creation time for the first review). Use that duration as your expected wait. If no prior review exists, default to 5 minutes. Use **progressive poll intervals**: 15s, 15s, 30s, 30s, then 60s thereafter — small diffs often complete in under a minute, so early frequent checks avoid wasting time. Set max wait to **2x the expected duration** (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take **10-15 minutes** for large diffs — do NOT give up early.
75
84
 
76
- The review is complete when a new `copilot-pull-request-reviewer` review node appears. If no review appears after max wait, **ask the user** whether to continue waiting, re-request, or skip.
85
+ The review is complete when a new `copilot-pull-request-reviewer` review node appears. If no review appears after max wait: **Default mode**: auto-skip and continue. **Interactive mode (`--interactive`)**: ask the user whether to continue waiting, re-request, or skip.
77
86
 
78
- **Error detection**: After a review appears, check its `body` for error text such as "Copilot encountered an error" or "unable to review this pull request". If found, this is NOT a successful review — log a warning, re-request the review (same API call above), and resume polling. Allow up to 3 error retries before asking the user whether to continue or skip.
87
+ **Error detection**: After a review appears, check its `body` for error text such as "Copilot encountered an error" or "unable to review this pull request". If found, this is NOT a successful review — log a warning, re-request the review (same API call above), and resume polling. Allow up to 3 error retries. After 3 failures: **Default mode**: auto-skip and continue. **Interactive mode (`--interactive`)**: ask the user whether to continue or skip.
79
88
 
80
89
  ## Notes
81
90
 
82
91
  - Only resolve threads where you've actually addressed the feedback
83
92
  - If feedback is unclear or incorrect, leave a reply comment instead of resolving
84
93
  - Always run tests before committing — never push code with known failures
94
+ - **Never dismiss findings as "out of scope" or "not modified in this PR."** If a review identifies a real issue, fix it — regardless of whether the current PR touched that code. Evaluate every finding on its merits. Don't leave trash on the floor.
@@ -16,7 +16,7 @@
16
16
  **Runtime correctness**
17
17
  - Null/undefined access without guards, off-by-one errors, object spread of potentially-null values (spread of null is `{}`, silently discarding state) or non-object values (spreading a string produces indexed character keys, spreading an array produces numeric keys) — guard with a plain-object check before spreading
18
18
  - Data from external/user sources (parsed JSON, API responses, file reads) used without structural validation — guard against parse failures, missing properties, wrong types, and null elements before accessing nested values. When parsed data is optional enrichment, isolate failures so they don't abort the main operation
19
- - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
19
+ - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters. Boolean values round-tripping through text serialization (markdown metadata, query strings, form data, flat-file config) become strings — `"false"` is truthy in JavaScript, so truthiness checks on deserialized booleans silently treat explicit `false` as `true`. Use strict equality (`=== true`, `=== 'true'`) or a dedicated coercion function; ensure the same coercion is applied at every consumption site
20
20
  - Functions that index into arrays without guarding empty arrays; aggregate operations (`every`, `some`, `reduce`) on potentially-empty collections returning vacuously true/default values that mask misconfiguration or missing data; state/variables declared but never updated or only partially wired up
21
21
  - Parallel arrays or tuples coupled by index position (e.g., a names array, a promises array, and a destructuring assignment that must stay aligned) — insertion or reordering in one silently misaligns all others. Use objects/maps keyed by a stable identifier instead
22
22
  - Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
@@ -27,23 +27,26 @@
27
27
  - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
28
28
  - Parameterized/wildcard routes registered before specific named routes — the generic route captures requests meant for the specific endpoint (e.g., `/:id` registered before `/drafts` matches `/drafts` as `id="drafts"`). Verify route registration order or use path prefixes to disambiguate
29
29
  - Stored or external URLs rendered as clickable links (`href`, `src`, `window.open`) without protocol validation — `javascript:`, `data:`, and `vbscript:` URLs execute in the user's browser. Allowlist `http:`/`https:` (and `mailto:` if needed) before rendering; for all other schemes, render as plain text or strip the value
30
+ - Server-side HTTP requests using user-configurable or externally-stored URLs without protocol allowlisting (http/https only) and host/network restrictions — the server becomes an SSRF proxy for reaching internal network services, cloud metadata endpoints, or localhost-bound APIs. Validate scheme and restrict to expected hosts or external-only ranges before any server-side fetch
30
31
  - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
31
32
 
32
33
  **Trust boundaries & data exposure**
33
34
  - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
34
35
  - Server trusting client-provided computed/derived values (scores, totals, correctness flags, file metadata like MIME type and size) when the server can recompute or verify them — strip and recompute server-side; for file uploads, validate content type via magic bytes and size via actual buffer length rather than trusting client-supplied headers
35
- - New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently
36
+ - New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently. When new capabilities require additional OAuth scopes or API permissions, verify the scope-upgrade check covers all required scopes — a check that only tests for one scope will miss newly added scopes, causing downstream API calls to fail with insufficient permissions
36
37
  - User-controlled objects merged via `Object.assign`/spread without sanitizing keys — `__proto__`, `constructor`, and `prototype` keys enable prototype pollution. Use `Object.create(null)` for the target, whitelist allowed keys, and use `hasOwnProperty` (not `in`) to check membership. Also verify the merge can't override reserved/internal fields the system depends on
38
+ - Push events (WebSocket, SSE, pub/sub) emitted without scoping to the originating user or session — sensitive payloads (user content, tokens, progress data, images) leak to all connected clients in multi-user environments. Scope events to the requesting session via room/channel isolation or include a correlation ID the client provides at request time; verify consumers filter events by correlation ID before updating UI state
37
39
 
38
40
  ## Tier 2 — Check When Relevant (Data Integrity, Async, Error Handling)
39
41
 
40
42
  **Async & state consistency** _[applies when: code uses async/await, Promises, or UI state]_
41
43
  - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths. Watch for `.catch(() => null)` followed by unconditional success code (toast, state update) — the catch silences the error but the success path still runs. Either let errors propagate naturally or check the return value before proceeding
42
- - Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted). Component state initialized from props via `useState(prop)` only captures the initial value — if the prop updates asynchronously (data fetch, parent re-render), the local state goes stale. Sync with an effect when the user is not actively editing, or lift state to avoid the copy
44
+ - Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted). Reference/selection sets that point to items in a data collection must be pruned when items are removed and invalidated when the collection is reloaded, filtered, paginated, or sorted — stale references send nonexistent IDs to downstream operations. Operations triggered from a confirmation dialog must re-validate preconditions (selection non-empty, items still exist) at execution time — the underlying data may change between dialog display and user confirmation. Component state initialized from props via `useState(prop)` only captures the initial value — if the prop updates asynchronously (data fetch, parent re-render), the local state goes stale. Sync with an effect when the user is not actively editing, or lift state to avoid the copy
43
45
  - Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages. For periodic polling, also check that error notifications are throttled or deduplicated (only fire on state transitions like success→error, not on every failed iteration) and that failure doesn't make the UI section disappear entirely (component returning null when data is null/errored) — render an error or stale-data state instead of absence
44
46
  - Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount. When appending items to a list optimistically, guard against duplicates (check existence before append) — concurrent or repeated operations can insert the same item multiple times
45
47
  - State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
46
- - Periodic/scheduled operations with skip conditions (gates, precondition checks, "nothing to do" early exits) that don't advance timing state (lastRun, nextFireTime) on skip — null or stale lastRun causes immediate re-trigger in a tight loop. Record the skip as an execution or compute the next fire time from now, not from the missing lastRun
48
+ - Periodic/scheduled operations with skip conditions (gates, precondition checks, "nothing to do" early exits) that don't advance timing state (lastRun, nextFireTime) on skip — null or stale lastRun causes immediate re-trigger in a tight loop. Record the skip as an execution or compute the next fire time from now, not from the missing lastRun. Also check the initial baseline for never-run items: using epoch (0) or distant-past as the default "last run" makes schedule-based items appear immediately due on first evaluation, while using "now" may cause them to never become due — choose a baseline that correctly represents "first occurrence after activation"
49
+ - Cached values keyed without all relevant discriminators (base URL, tenant ID, environment, configuration version) — context changes (URL reconfiguration, tenant switch) serve stale cached data from the previous context. Health/status endpoints that return cached results instead of live probes mask real-time failures, reporting "connected" when the service is unreachable. Key caches by their full context and bypass or invalidate caches for availability checks
47
50
  - Mutation/trigger functions that return or propagate stale pre-mutation state — if a function activates, updates, or resets an entity, the returned value and any dependent scheduling/evaluation state (backoff timers, "last run" timestamps, status flags) must reflect the post-mutation state, not a snapshot read before the mutation
48
51
  - Fire-and-forget or async writes where the in-memory object is not updated (response returns stale data) or is updated unconditionally regardless of write success (response claims state that was never persisted) — update in-memory state conditionally on write outcome, or document the tradeoff explicitly. Also applies to responses and business-logic decisions (threshold triggers, status transitions) derived from pre-transaction reads — concurrent writers all read the same stale value, so thresholds may be crossed without triggering the transition. Compute from post-write state or use conditional expressions that evaluate the stored value. For monotonic counters (sequence numbers, cursors) that must stay in lockstep with append-only storage, advancing before the write risks the counter running ahead on failure; not advancing after a partial write risks reuse — reserve the range before writing and commit only on success. Also check for dependent side effects (rewards, notifications, secondary uploads, resource allocation) executing in parallel with or before the primary write they depend on — if the primary write fails or is rejected (lock contention, dedup, validation), the side effects are irrecoverable (orphaned uploads, unearned rewards, phantom notifications). Gate side effects on confirmed primary write success
49
52
  - Error/early-exit paths that return status metadata (pagination flags, truncation indicators, hasMore, completion markers) or emit events (WebSocket, SSE, pub/sub) with default/initial values instead of reflecting actual accumulated state — downstream consumers make incorrect decisions (e.g., treating a failed sync as successful because the completion event was emitted unconditionally). Set metadata flags and event payloads based on actual outcome, not just the final request's exit path. Also check paired lifecycle events (started/completed/failed): if a function emits a "started" event, every exit path — including early returns and no-op branches — must emit the corresponding "completed" or "failed" event, or clients waiting for completion will hang or show stale state
@@ -62,7 +65,7 @@
62
65
 
63
66
  **Resource management** _[applies when: code uses event listeners, timers, subscriptions, or useEffect]_
64
67
  - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
65
- - Deletion/destroy and state-reset functions that clean up or reset the primary resource but leave orphaned or inconsistent secondary resources (data directories, git branches, child records, temporary files, per-user flag/vote items) — trace all resources created during the entity's lifecycle and verify each is removed on delete. For state transitions that reset aggregate values (counters, scores, flags), also clear or version the individual records that contributed to those aggregates — otherwise the aggregate and its sources disagree, and duplicate-prevention checks block legitimate re-entry
68
+ - Deletion/destroy and state-reset functions that clean up or reset the primary resource but leave orphaned or inconsistent secondary resources (data directories, git branches, child records, temporary files, per-user flag/vote items) — trace all resources created during the entity's lifecycle and verify each is removed on delete. For state transitions that reset aggregate values (counters, scores, flags), also clear or version the individual records that contributed to those aggregates — otherwise the aggregate and its sources disagree, and duplicate-prevention checks block legitimate re-entry. Also check cleanup operations that perform implicit state mutations (auto-merge, auto-commit, cascade writes) as part of teardown — these can introduce unreviewed changes or silently modify shared state. Verify cleanup fails safely when a prerequisite step (e.g., saving dirty state) fails rather than proceeding with data loss
66
69
  - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
67
70
  - Self-rescheduling callbacks (one-shot timers, deferred job handlers) where the next cycle is registered inside the callback body — an unhandled error before the re-registration call permanently stops the schedule. Wrap the callback body in try/finally with re-registration in the finally block, or register the next cycle before executing the current one
68
71
 
@@ -77,16 +80,19 @@
77
80
  - Summary/aggregation endpoints that compute counts or previews via a different query path, filter set, or data source than the detail views they link to — users see inconsistent numbers between the dashboard and the destination page. Trace the computation logic in both paths and verify they apply the same filters, exclusions, and ordering guarantees (or document the intentional difference)
78
81
  - When a validation/sanitization/normalization function is introduced for a field, trace ALL write paths (create, update, sync, import, raw/bulk persist) — partial application means invalid values re-enter through the unguarded path. This includes structural normalization (ID prefixes, required defaults, shape invariants) that the read/parse path depends on — a "raw" write path that skips normalization produces data that changes identity or shape on reload
79
82
  - Stored config/settings merged with hardcoded defaults using shallow spread — nested objects in the stored copy entirely replace the default, dropping newly added default keys on upgrade. Use deep merge for nested config objects (while preserving explicit `null` to clear a field), or flatten the config structure so shallow merge suffices
80
- - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
83
+ - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Also check for parameters accepted and validated in the schema but never consumed by the implementation — dead API surface that misleads callers into believing they're configuring behavior that's silently ignored; remove unused parameters or wire them through to the implementation. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
84
+ - Multi-part UI features (e.g., table header + rows) whose rendering is gated on different prop/condition subsets — if the header checks prop A while rows check prop B, partial provision causes structural misalignment (column count mismatch, orphaned interactive elements without handlers). Derive a single enablement boolean from the complete prop set and use it consistently across all participating components
81
85
  - Entity creation without case-insensitive uniqueness checks — names differing only in case (e.g., "MyAgent" vs "myagent") cause collisions in case-insensitive contexts (file paths, git branches, URLs). Normalize to lowercase before comparing
82
- - Code reading properties from API responses, framework-provided objects, or internal abstraction layers using field names the source doesn't populate or forward — silent `undefined`. Verify property names and nesting depth match the actual response shape (e.g., `response.items` vs `response.data.items`, `obj.placeId` vs `obj.id`, flat fields vs nested sub-objects). When building a new consumer against an existing API, check the producer's actual response — not assumed conventions. When branching on fields from a wrapped third-party API, confirm the wrapper actually requests and forwards those fields (e.g., optional response attributes that require explicit opt-in)
86
+ - Code reading properties from API responses, framework-provided objects, or internal abstraction layers using field names the source doesn't populate or forward — silent `undefined`. Verify property names and nesting depth match the actual response shape (e.g., `response.items` vs `response.data.items`, `obj.placeId` vs `obj.id`, flat fields vs nested sub-objects). When building a new consumer against an existing API, check the producer's actual response — not assumed conventions. When branching on fields from a wrapped third-party API, confirm the wrapper actually requests and forwards those fields (e.g., optional response attributes that require explicit opt-in). Also verify call sites pass inputs in the format the called function actually accepts — framework constructors with non-obvious positional argument order, loaders with format-specific variants (content paths vs script paths, asset objects vs class references), and accessor APIs with distinct method-vs-property semantics. Fallback branches in multi-format dispatchers commonly use the wrong function for the input type
83
87
  - Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use. When new logic (access control, UI display, queries) checks only a newly introduced field, verify it falls back to any legacy field that existing records still use — otherwise records created before the migration are silently excluded or inaccessible. Also check entity identity keys: if code looks up or matches entities using a computed key (e.g., `e.id || e.externalId`), all code paths that perform the same lookup must use the same key computation — one path using `e.id` while another uses `e.id || e.externalId` causes mismatches for entities missing the primary key
84
88
  - Entity type changes without invariant revalidation — when an entity has a discriminator field (type, kind, category) and the user changes it, all type-specific invariants must be enforced on the new type AND type-specific fields from the old type must be cleared or revalidated. A job changing from `shell` to `agent` without clearing `command`, or changing to `shell` without requiring `command`, leaves the entity in an invalid hybrid state that fails at runtime or resurfaces stale data
89
+ - Invariant relationships between configuration flags (flag A implies flag B) not enforced across all layers — UI toggle handlers, API validation schemas, server default-application functions, and serialization/deserialization must all preserve the invariant. If any layer allows setting A=true with B=false (or vice versa), cascading defaults and toggle logic produce contradictory state. Trace the invariant through: UI state handlers, form submission, route validation, service defaults, and persistence round-trip
85
90
  - Operations scoped to a specific entity subtype that don't verify the entity's type discriminator before processing — an endpoint or function designed for one account/entity type that accepts any entity by ID can corrupt state or produce wrong results when called with the wrong type. Add an explicit type guard and return a structured error
86
- - Inconsistent "missing value" semantics across layers — one layer treats `null`/`undefined` as missing while another also treats empty strings or whitespace-only strings as missing. Query filters, update expressions, and UI predicates that disagree on what constitutes "missing" cause records to be skipped by one path but processed by another. Define a single `isMissing` predicate and use it consistently, or normalize empty/whitespace values to `null` at write time. Also applies to comparison/detection logic: coercing an absent field to a sentinel (`?? 0`, default parameters) makes the logic treat "unsupported" as a real value — guard with an explicit presence check before comparing. Watch for validation/sanitization functions that return `null` for invalid input when `null` also means "clear/delete" downstream — malformed input silently destroys existing data. Distinguish "invalid, reject the request" from "explicitly clear this field"
91
+ - Inconsistent "missing value" semantics across layers — one layer treats `null`/`undefined` as missing while another also treats empty strings or whitespace-only strings as missing. Query filters, update expressions, and UI predicates that disagree on what constitutes "missing" cause records to be skipped by one path but processed by another. Define a single `isMissing` predicate and use it consistently, or normalize empty/whitespace values to `null` at write time. Also applies to comparison/detection logic: coercing an absent field to a sentinel (`?? 0`, default parameters) makes the logic treat "unsupported" as a real value — guard with an explicit presence check before comparing. Watch for validation/sanitization functions that return `null` for invalid input when `null` also means "clear/delete" downstream — malformed input silently destroys existing data. Distinguish "invalid, reject the request" from "explicitly clear this field". Also applies to normalization (trailing slashes, case, whitespace): if one path normalizes a value before comparison but the write path stores it un-normalized, comparisons against the stored value produce incorrect results — normalize at write time or normalize both sides consistently
92
+ - Validation functions that delegate to runtime-behavior computations (next schedule occurrence, URL reachability, resource resolution) — conflating "no result within search window" or "temporarily unavailable" with "invalid input" rejects valid configurations. Validate syntax and structure independently of runtime feasibility
87
93
  - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
88
94
  - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
89
- - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); silent operations in verbose sequences where all branches should print status
95
+ - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); counters incremented before confirming the operation actually changed state — rejected, skipped, or no-op iterations inflate success counts. Silent operations in verbose sequences where all branches should print status
90
96
 
91
97
  **Concurrency & data integrity** _[applies when: code has shared state, database writes, or multi-step mutations]_
92
98
  - Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave — use conditional writes/optimistic concurrency (e.g., condition expressions, version checks) to close the gap between read and write; if the conditional write fails, surface a retryable error instead of letting it bubble as a 500
@@ -98,7 +104,7 @@
98
104
 
99
105
  **Input handling** _[applies when: code accepts user/external input]_
100
106
  - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
101
- - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
107
+ - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs. Also validate element-level invariants (types, format, non-empty) and deduplicate — duplicate elements inflate operation counts, repeat side effects, and skew success/failure metrics. Also check internal operations that fan out unbounded parallel I/O (e.g., `Promise.all(files.map(readFile))`) — large collections risk EMFILE (too many open file descriptors) or memory exhaustion. Use a concurrency limiter or batch processing for collections that can grow without bound
102
108
  - Security/sanitization functions (redaction, escaping, validation) that only handle one input format — if data can arrive in multiple formats (JSON `"KEY": "value"`, shell `KEY=value`, URL-encoded, headers), the function must cover all formats present in the system or sensitive data leaks through the unhandled format
103
109
 
104
110
  ## Tier 3 — Domain-Specific (Check Only When File Type Matches)
@@ -124,6 +130,7 @@
124
130
  **Lazy initialization & module loading** _[applies when: code uses dynamic imports, lazy singletons, or bootstrap sequences]_
125
131
  - Cached state getters returning null before initialization — provide async initializer or ensure-style function
126
132
  - Module-level side effects (file reads, SDK init) without error handling — corrupted files crash the process on import
133
+ - File writes that assume the parent directory exists — on fresh installs or after directory cleanup, the write fails with ENOENT. Ensure the directory exists before writing (or create it on demand)
127
134
  - Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
128
135
  - Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
129
136
 
@@ -167,7 +174,7 @@
167
174
  - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
168
175
  - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
169
176
  - Lookups that check only one scope when multiple exist — e.g., checking local git branches but not remote, checking in-memory cache but not persistent store. Trace all locations where the resource could exist and check each
170
- - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
177
+ - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead. More broadly, safety/guard checks that catch errors and default to "safe to proceed" (fail-open) rather than treating errors as "unsafe, abort" (fail-closed) — a guard that silently succeeds on error provides no protection when it's needed most
171
178
  - Registering references to resources without verifying the resource exists — dangling references after failed operations
172
179
 
173
180
  **Automated pipeline discipline**
@@ -199,7 +206,7 @@
199
206
  **Test coverage**
200
207
  - New logic/schemas/services without corresponding tests when similar existing code has tests
201
208
  - New error paths untestable because services throw generic errors instead of typed ones
202
- - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
209
+ - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses. Includes tests that assert by inspecting function source code (string-matching implementation details) rather than calling the function and checking behavior — they break on harmless refactors while missing actual behavioral changes. Also tests that mutate global state at import time (module registries, sys.modules) without fixture-scoped cleanup — causes ordering-dependent failures across the test session
203
210
  - Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
204
211
  - Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
205
212
  - Tests that exercise code paths depending on features the integration layer doesn't expose — they pass against mocks but the behavior can't trigger in production. Verify mocked responses match what the real dependency actually returns
@@ -56,8 +56,8 @@ Run the following loop until Copilot returns zero new comments:
56
56
  - Error detection: if the review body contains "Copilot encountered an
57
57
  error" or "unable to review this pull request", re-request (step 1)
58
58
  and resume polling. Max 3 error retries before reporting failure.
59
- - If no review appears after max wait, report the timeout — the parent
60
- agent will ask the user what to do
59
+ - If no review appears after max wait, report the timeout.
60
+ **Default mode**: skip and continue. **Interactive mode (`--interactive`)**: ask the user what to do
61
61
 
62
62
  3. CHECK for unresolved comments:
63
63
  - Filter review threads for isResolved: false
@@ -70,6 +70,7 @@ Run the following loop until Copilot returns zero new comments:
70
70
  4. FIX all unresolved review comments:
71
71
  For each unresolved thread:
72
72
  - Read the referenced file and understand the feedback
73
+ - Evaluate if the finding is a real issue — if it is, fix it regardless of whether the current PR modified that code. Never dismiss findings as "out of scope" or "pre-existing."
73
74
  - Make the code fix
74
75
  - Run the build command
75
76
  - If build passes, commit: address review: <summary>
@@ -78,8 +79,8 @@ Run the following loop until Copilot returns zero new comments:
78
79
  - After all threads resolved, push all commits to remote
79
80
  - Increment iteration counter
80
81
  - If iteration counter reaches 10, stop the loop and report back with
81
- status "guardrail" the parent agent will ask the user whether to
82
- continue or stop
82
+ status "guardrail". **Default mode**: auto-stop and mark as best-effort.
83
+ **Interactive mode (`--interactive`)**: ask the user whether to continue or stop
83
84
  - Otherwise, go back to step 1
84
85
 
85
86
  When done, report back:
@@ -89,4 +90,8 @@ When done, report back:
89
90
  - Any unresolved threads remaining
90
91
  ```
91
92
 
92
- Launch the sub-agent and wait for its result. If the sub-agent reports a timeout or error, **ask the user** whether to continue waiting, re-request the review, or skip — never proceed without user approval when the review loop fails.
93
+ Launch the sub-agent and wait for its result.
94
+
95
+ **Default mode**: If the sub-agent reports a timeout or error, skip the timed-out review and continue autonomously.
96
+
97
+ **Interactive mode (`--interactive`)**: If the sub-agent reports a timeout or error, ask the user whether to continue waiting, re-request the review, or skip.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "slash-do",
3
- "version": "1.9.0",
3
+ "version": "2.1.0",
4
4
  "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
5
5
  "author": "Adam Eivy <adam@eivy.com>",
6
6
  "license": "MIT",