@clipboard-health/ai-rules 2.14.26 → 2.15.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@clipboard-health/ai-rules",
3
- "version": "2.14.26",
3
+ "version": "2.15.1",
4
4
  "description": "Pre-built AI agent rules for consistent coding standards.",
5
5
  "keywords": [
6
6
  "ai",
@@ -29,15 +29,13 @@ Rules:
29
29
  - Normalized `<= 240` → best-effort same-turn loop: `sleep <seconds>` between iterations.
30
30
  - Normalized `> 240` → run one pass, then report that longer cadences need an external loop wrapper (the Claude Code `/loop` skill or a shell `while` loop outside the agent). Do not sleep inside the agent turn — blocking `sleep` past ~5 minutes will exceed prompt-cache TTLs and may hit tool-call timeouts.
31
31
 
32
- ## Sentinel
32
+ ## Sentinels
33
33
 
34
- The skill tags every reply it posts with:
34
+ The skill uses two HTML-comment sentinels.
35
35
 
36
- ```html
37
- <!-- babysit-pr:addressed v1 -->
38
- ```
36
+ **Addressed sentinel**: `<!-- babysit-pr:addressed v1 -->`. Appended on its own line at the end of every reply the skill posts (both thread replies and the nitpick summary). This is how the skill knows, on re-runs, which threads and nitpicks it already handled.
39
37
 
40
- on its own line at the end of the body. This is how the skill knows, on re-runs, which threads and nitpicks it already handled.
38
+ **Follow-up sentinel**: `<!-- babysit-pr:followup v1 -->`. Attached to replies that defer an out-of-scope comment as a tracked follow-up (see the Scope subsection and the Defer verdict in step 6). Grep `babysit-pr:followup` across PR conversation JSON to enumerate deferred items. This sentinel is additive the post-reply scripts still append the `addressed` sentinel at the end, so a deferred thread is correctly machine-classified as addressed (the skill _has_ handled it by deferring). Human reviewers and future sweeps distinguish deferred from resolved by looking for the follow-up sentinel.
41
39
 
42
40
  **Sentinel recency rules.** The script emits a per-thread `activityState` with three values:
43
41
 
@@ -124,6 +122,34 @@ The output JSON has:
124
122
  - `nitpickComments`: parsed CodeRabbit nitpicks, each with a stable `fingerprint`.
125
123
  - `totalActiveThreads`, `totalUncertainThreads`, `totalNitpicks`, `totalUnresolvedComments` for quick checks.
126
124
 
125
+ ### Scope
126
+
127
+ This PR's review-feedback scope is strict by default. Steps 6 (threads) and 7 (nitpicks) classify each comment as in-scope or out-of-scope using this rule before choosing a verdict. Step 5 (CI) uses the broader CI-scope rule in that step, not this one — CI can legitimately fail on unchanged lines because the PR changed a contract or dependency path.
128
+
129
+ Build the changed-line set from `gh pr diff` once per iteration. Count changed diff lines on both sides: added lines in the new version, removed lines in the old version, and modified code represented by adjacent remove/add pairs. Do not count diff context lines. A reviewer comment or nitpick is **in scope** when its anchor falls on a changed diff line on either side of the hunk. Deleted-line comments like "why remove this?" or "please add this back" are in scope by definition. For a range like `12-14`, any overlap with a changed diff line is in scope.
130
+
131
+ When matching review comments to hunks, use the anchor line provided by `unresolvedPrComments.sh`; it may be the current `line` or the script's fallback to `originalLine`. Compare that anchor against both new-side added ranges and old-side removed ranges.
132
+
133
+ Comments on unchanged/context lines, touched files outside changed lines, or untouched files are **out of scope by default**.
134
+
135
+ Narrow escape hatch: treat an unchanged/context-line comment as in scope only when there is an explicit external signal that this PR caused or requires the issue. Acceptable signals:
136
+
137
+ - The reviewer explicitly ties the concern to this PR's change.
138
+ - The comment points to an unchanged line directly used by a changed diff line, and you can name the changed `file:line` that creates the coupling.
139
+ - CI, test, or typecheck output proves the PR changed the contract or behavior for the symbol, API, or execution path named by the comment.
140
+
141
+ If you cannot name one of those signals, classify the comment as out of scope. Do not use broad judgment phrases like "related", "nearby", "maintainability", or "review confidence" to widen scope.
142
+
143
+ Default posture: focus on in-scope feedback. For out-of-scope feedback, apply the fix **only** if it meets the out-of-scope bar below. Otherwise defer with a follow-up reply.
144
+
145
+ **Out-of-scope fix bar** (apply the fix even though it's out of scope):
146
+
147
+ - Security vulnerability, data loss, or crash in the PR's execution path.
148
+ - Obvious correctness bug (wrong output, broken invariant) confirmed by reading the referenced code.
149
+ - One-line or trivial change that obviously cannot regress anything (typo, missing null check matching surrounding style, etc.).
150
+
151
+ **Everything else → Defer** (for out-of-scope fix requests that miss the bar): post a Defer reply tagged with the follow-up sentinel (see step 9). Do not expand the PR. Disagree and Already-fixed still apply to out-of-scope comments when the reviewer is wrong or the concern is already handled elsewhere; Defer is specifically for "this is a real but out-of-scope ask we are choosing not to act on here."
152
+
127
153
  ### 5. Handle CI failures (conservative)
128
154
 
129
155
  Run `bash scripts/fetchFailedLogs.sh` to stream failed output for every failing check on the PR. The first line is either:
@@ -151,7 +177,7 @@ Read the logs and diagnose: **build/type errors first** (they cause cascading te
151
177
  - Ambiguous test intent.
152
178
  - External checks with no inspectable logs.
153
179
 
154
- Scope check: `gh pr diff --name-only`. This is PR-authoritative works even if the local base ref is missing or stale (e.g., in fresh clones or CI sandboxes). A fix outside these files is out of scope — report it, don't apply it.
180
+ Scope check for CI: scope is the PR's changed files plus failures directly caused by those changes in the PR's execution path. Use `gh pr diff --name-only` as the first signal — this is PR-authoritative and works even if the local base ref is missing or stale (e.g., in fresh clones or CI sandboxes). Allow fixes outside changed files only when the logs and code make causality clear (e.g., the PR renamed a symbol that a sibling test references). CI failures outside that surface are out of scope — report the diagnosis, don't apply speculative fixes. CI fixes are never Deferred as follow-ups: CI needs to pass on this PR.
155
181
 
156
182
  ### 6. Assess active review threads
157
183
 
@@ -162,17 +188,28 @@ For every thread in `activeThreads` (this includes both `"active"` and `"uncerta
162
188
  - If `activityState == "uncertain"`, read EVERY entry in `postSentinelBotComments` (not just the newest):
163
189
  - If EVERY entry is a non-actionable acknowledgement → mark the thread **Skip-reply** (the existing sentinel already covers the thread; posting again would be noise). Do not classify it Agree/Disagree/Already-fixed. Record this in the final summary so the skip is visible.
164
190
  - If ANY entry carries new actionable content → treat the thread as new feedback and proceed below. Note in the final summary that an uncertain thread was reactivated, citing the specific comment.
165
- - For each remaining thread (i.e., NOT marked Skip-reply), pick one verdict each of these will get a reply posted in step 9:
166
- - **Agree** the comment identifies a real issue. Apply the fix. Record the thread ID and a one-line what-changed.
167
- - **Disagree** the current code is acceptable. Record a short reasoning.
168
- - **Already fixed** — a prior commit addresses the concern. Record a pointer (commit SHA, file:line).
191
+ - Each remaining thread (i.e., NOT marked Skip-reply) gets a scope classification first. Use the Scope subsection to label it in-scope or out-of-scope. For comments on deleted lines, record that the anchor is on the removed side of the diff. For any unchanged/context-line comment classified in scope via the narrow escape hatch, record the external signal and the changed `file:line` when applicable.
192
+ - Then pick one verdict each of these (except Skip-reply) will get a reply posted in step 9:
193
+ - **In-scope** threads use the original three verdicts:
194
+ - **Agree** — the comment identifies a real issue. Apply the fix. Record the thread ID and a one-line what-changed.
195
+ - **Disagree** — the current code is acceptable. Record a short reasoning.
196
+ - **Already fixed** — a prior commit addresses the concern. Record a pointer (commit SHA, file:line).
197
+ - **Out-of-scope** threads apply the out-of-scope fix bar from the Scope subsection:
198
+ - Meets the bar → **Agree** (apply the fix, and note in the reply that it was fixed despite being out of scope because it met the bar).
199
+ - Does not meet the bar → **Defer** (new verdict). Record a one-line rationale and, if relevant, a pointer to where the concern lives.
200
+ - Disagree and Already-fixed can still apply to out-of-scope comments (e.g., reviewer asks for a refactor that's already landed on main, or misreads the code).
169
201
 
170
202
  ### 7. Assess nitpicks
171
203
 
172
204
  For every nitpick in `nitpickComments`:
173
205
 
174
206
  - Check whether its `fingerprint` already appears in a prior babysit-pr sentinel comment on the PR. If yes, skip.
175
- - Otherwise classify (Agree / Disagree / Already fixed) the same way as threads. If Agree, apply the fix.
207
+ - **Classify scope** (in / out) using the Scope subsection. For CodeRabbit nitpick ranges like `12-14`, any overlap with changed diff lines on either side of the hunk is in scope; no overlap is out of scope unless one of the explicit escape-hatch signals applies.
208
+ - Pick a verdict:
209
+ - In-scope → Agree / Disagree / Already fixed (as with threads). If Agree, apply the fix.
210
+ - Out-of-scope → apply the out-of-scope fix bar. Meets the bar → Agree and apply the fix, noting in the summary that it was fixed despite being out of scope. Does not meet the bar → **Defer**. A Deferred nitpick does not get its own top-level comment; it goes into the nitpick summary under the **Deferred (out of scope)** heading (see step 9).
211
+
212
+ Deferred nitpick fingerprints still go into the fenced fingerprint block at the end of the summary alongside addressed ones, so future runs dedupe correctly — the nitpick is handled, just handled by deferring.
176
213
 
177
214
  If no nitpicks remain after filtering, skip ONLY the top-level nitpick-summary comment in step 9. Still post thread replies for every non-Skip-reply thread from step 6.
178
215
 
@@ -200,7 +237,7 @@ Capture the `url=` line for the reply templates in step 9.
200
237
 
201
238
  ### 9. Post replies
202
239
 
203
- For every thread assessed in step 6 that was NOT marked **Skip-reply** (i.e., one of Agree / Disagree / Already fixed):
240
+ For every thread assessed in step 6 that was NOT marked **Skip-reply** (i.e., one of Agree / Disagree / Already fixed / Defer):
204
241
 
205
242
  ```bash
206
243
  bash scripts/postSentinelReply.sh "$THREAD_ID" "$BODY"
@@ -208,11 +245,14 @@ bash scripts/postSentinelReply.sh "$THREAD_ID" "$BODY"
208
245
 
209
246
  Skip-reply threads (uncertain threads where every post-sentinel bot comment was a non-actionable ack) are left alone — the existing sentinel already covers them.
210
247
 
211
- Body templates (the script appends the sentinel if missing):
248
+ Body templates (the script appends the `addressed` sentinel if missing):
212
249
 
213
250
  - **Agree**: `Addressed in <commit-url>. <one-line what-changed>.`
214
251
  - **Disagree**: `Leaving current behavior. <reasoning>.`
215
252
  - **Already fixed**: `Already handled by <commit-url-or-file:line>. <brief pointer>.`
253
+ - **Defer**: `Out of scope for this PR; this looks like follow-up work rather than something introduced or required by this change. <one-line rationale or pointer if useful>.\n\n<!-- babysit-pr:followup v1 -->`
254
+
255
+ For Defer replies, include the follow-up sentinel on its own line as shown. The script will append the `addressed` sentinel after it on its own line, so the final body ends with the follow-up sentinel followed by a blank line followed by the `addressed` sentinel — `grep babysit-pr:followup` finds the deferral and `grep babysit-pr:addressed` still marks the thread handled for dedupe.
216
256
 
217
257
  The script uses the `addPullRequestReviewThreadReply` GraphQL mutation. It does NOT resolve the thread.
218
258
 
@@ -224,9 +264,10 @@ bash scripts/postSentinelPrComment.sh "$PR_NUMBER" "$BODY"
224
264
 
225
265
  The nitpick summary body should:
226
266
 
227
- - Group verdicts under **Agree / Disagree / Already fixed** headings.
267
+ - Group verdicts under **Agree / Disagree / Already fixed / Deferred (out of scope)** headings. Omit a heading if its list is empty.
268
+ - Under **Deferred (out of scope)**, list each deferred nitpick as a bullet, followed on its own line by `<!-- babysit-pr:followup v1 -->` so grep catches them individually.
228
269
  - Include the commit URL for fixes.
229
- - Include every current nitpick's `fingerprint` in a fenced block at the end (one per line, before the sentinel) so future runs can dedupe.
270
+ - Include every current nitpick's `fingerprint` — addressed and deferred — in a fenced block at the end (one per line, before the sentinel) so future runs can dedupe. Deferred nitpicks count as handled for dedupe purposes.
230
271
 
231
272
  ### 10. Summarize
232
273
 
@@ -234,16 +275,24 @@ Report:
234
275
 
235
276
  - Commits made (with URLs).
236
277
  - CI checks fixed / still failing / skipped-with-diagnosis.
237
- - Review threads replied to, grouped by verdict.
238
- - Nitpicks summarized (or skipped because already covered).
278
+ - Review threads replied to, grouped by verdict (including any Defer count: "X threads deferred as follow-ups").
279
+ - Nitpicks summarized (or skipped because already covered), including the Deferred count: "Y nitpicks deferred as follow-ups".
239
280
  - Threads left active because of bot-acknowledgement uncertainty (flag by thread URL).
240
281
  - The stop condition triggered for this iteration (clean / stuck / continue / long-interval / sanity-cap).
241
282
 
283
+ When the report mentions any deferrals, include a one-liner the user can run later to enumerate them, e.g.:
284
+
285
+ ```bash
286
+ gh api graphql -f query='query($o:String!,$r:String!,$n:Int!){repository(owner:$o,name:$r){pullRequest(number:$n){reviewThreads(first:100){nodes{comments(first:50){nodes{body url}}}}comments(first:100){nodes{body url}}}}}' -F o=<owner> -F r=<repo> -F n=<pr> | grep -B1 babysit-pr:followup
287
+ ```
288
+
289
+ Do not rely only on `gh pr view --json comments,reviews` — that view can miss inline review-thread replies, which is where most Defer replies live.
290
+
242
291
  ## Loop control
243
292
 
244
293
  After an iteration, pick exactly one outcome:
245
294
 
246
- - **Exit clean** — all CI checks passed AND every thread in `activeThreads` was either marked Skip-reply during step 6's inspection or has already received a fresh sentinel reply in this iteration, AND every current nitpick fingerprint is covered by an existing sentinel comment. Do not use raw `totalActiveThreads` from the script output — it is pre-inspection and will stay non-zero for Skip-reply cases. Report success and stop.
295
+ - **Exit clean** — all CI checks passed AND every thread in `activeThreads` was either marked Skip-reply during step 6's inspection or has already received a fresh sentinel reply in this iteration (Agree / Disagree / Already-fixed / **Defer** all count — a Defer reply is a sentinel reply), AND every current nitpick fingerprint is covered by an existing sentinel comment (deferred nitpicks count; they're in the summary's fingerprint block). Do not use raw `totalActiveThreads` from the script output — it is pre-inspection and will stay non-zero for Skip-reply cases. A PR with Deferred threads is still clean from babysit's perspective: the skill has done what it can without widening scope. Report success and stop.
247
296
  - **Exit stuck** — iteration made no commits, posted no new replies, and no CI check changed state from the previous iteration. Report state and stop; tell the user to investigate.
248
297
  - **Continue** — interval set, normalized `<= 240`, not clean, not stuck:
249
298
 
@@ -285,6 +334,23 @@ User: `babysit-pr 2m`
285
334
  - Iteration 2: CI fails (lint on the nitpick fix). Log shows unused import. High-confidence + in scope → remove import, commit `d4e5f6a`, push. Threads are all addressed. `sleep 120`.
286
335
  - Iteration 3: CI green, 0 active threads, 0 new nitpick fingerprints. **Exit clean.** Report final commit SHAs and reply URLs.
287
336
 
337
+ ### Example 3: out-of-scope nitpick gets deferred
338
+
339
+ User: `babysit my PR`
340
+
341
+ - Preflight OK, PR #612 found, CI green.
342
+ - `unresolvedPrComments.sh` returns 1 active thread and 2 nitpicks:
343
+ - Thread on `src/users.ts:82` (unchanged, not touched by diff) — reviewer: "while you're here, this helper could be memoized".
344
+ - Nitpick on `src/orders.ts:45-47` — anchor overlaps a changed line; CodeRabbit says the error message should use backticks. In scope.
345
+ - Nitpick on `src/unrelated.ts:10` — file not touched by the PR. Out of scope, no escape-hatch signal.
346
+ - Scope classification:
347
+ - Thread is on an unchanged line; reviewer doesn't tie it to this PR's changes; doesn't meet the fix bar (not a crash, not a bug, not trivial). → **Defer**.
348
+ - First nitpick is in-scope → **Agree**, apply backtick fix.
349
+ - Second nitpick is out-of-scope, not a correctness bug, not a one-liner → **Defer** (goes under the Deferred (out of scope) heading in the summary).
350
+ - Commit `f00dbabe` for the in-scope nitpick fix. Post Defer reply on the thread with the `babysit-pr:followup v1` sentinel above the `addressed` sentinel. Post the nitpick summary with Agree (1) and Deferred (out of scope) (1) headings; both fingerprints listed in the fenced block.
351
+ - Summary reports: "1 thread deferred as follow-up, 1 nitpick deferred as follow-up" plus the `gh api graphql ... | grep babysit-pr:followup` one-liner.
352
+ - **Exit clean** — Defer replies count as fresh sentinel replies; all fingerprints are covered.
353
+
288
354
  ## Input
289
355
 
290
356
  Interval: $ARGUMENTS